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Abstract 

SAFE is a clean-slate design for a highly secure computer sys- 
tem, with pervasive mechanisms for tracking and limiting infor- 
mation flows. At the lowest level, the SAFE hardware supports 
fine-grained programmable tags, with efficient and flexible prop- 
agation and combination of tags as instructions are executed. The 
operating system virtualizes these generic facilities to present an 
information-flow abstract machine that allows user programs to la- 
bel sensitive data with rich confidentiality policies. We present a 
formal, machine-checked model of the key hardware and software 
mechanisms used to control information flow in SAFE and an end- 
to-end proof of noninterference for this model. 

Categories and Subject Descriptors D.4.6 [Security and Protec- 
tion]: Information flow controls; D.2.4 [Software Engineering]: 
Software/Program Verification 

Keywords security; clean-slate design; tagged architecture; 
information-flow control; formal verification; refinement 

1. Introduction 

The SAFE design is motivated by the conviction that the insecurity 
of present-day computer systems is due in large part to legacy 
design decisions left over from an era of scarce hardware resources. 
The time is ripe for a complete rethink of the entire system stack 
with security as the central focus. In particular, designers should be 
willing to spend more of the abundant processing power available 
on today's chips to improve security. 

A key feature of SAFE is that every piece of data, down to the 
word level, is annotated with a tag representing policies that govern 
its use. While the tagging mechanism is very general, one partic- 
ularly interesting use of tags is for representing information-flow 
control (IFC) policies. For example, an individual record might be 
tagged "This information should only be seen by principals Alice 
or Bob," a function pointer might be tagged "This code is trusted to 
work with Carol's secrets," or a string might be tagged "This came 
from the network and has not been sanitized yet." Such tags repre- 
senting IFC policies can involve arbitrary sets of principals, and 
principals themselves can be dynamically allocated to represent an 
unbounded number of entities within and outside the system. 

At the programming-language level, rich IFC policies have been 
extensively explored, with many proposed designs for static [35, 
60, 61, 65, 69, 86] and dynamic [4-7, 32, 36, 64, 67, 70, 79] en- 
forcement mechanisms and a huge literature on their formal prop- 
erties [35, 69, etc.]. Similarly, operating systems with information- 
flow tracking have been a staple of the OS literature for over a 
decade [29, 45, 46, 57, 87, 87]. But progress at the hardware level 
has been more limited, with most proposals concentrating on hard- 
ware acceleration for taint- tracking schemes [14, 20, 21, 25, 26, 
80, 82]. SAFE extends the state of the art in two significant ways. 



First, the SAFE machine offers hardware support for sound and ef- 
ficient purely-dynamic tracking of both explicit and implicit flows 
(i.e., information leaks through both data and control flow) for ar- 
bitrary machine code programs — not just programs accepted by 
static analysis, or produced by translation or transformation. More- 
over, rather than using just a few "taint bits," SAFE associates a 
word-sized tag to every word of data in the machine — both memory 
and registers. In particular, SAFE tags can be pointers to arbitrary 
data structures in memory. The interpretation of these tags is left en- 
tirely to software: the hardware just propagates tags from operands 
to results as each instruction is executed, following software-defined 
rules. Second, the SAFE design has been informed from the start by 
an intensive effort to formalize critical properties of its key mech- 
anisms and produce machine-checked proofs, in parallel with the 
design and implementation of its hardware and system software. 
Though some prior work (surveyed in §12) shares some of these 
aims, to the best of our knowledge no project has attempted this 
combination of innovations. 

Abstractly, the tag propagation rules in SAFE can be viewed as 
a partial function from argument tuples of the form (opcode, pc tag, 
argument^ tag, argument^ tag, . . . ) to result tuples of the form (new 
pc tag, result tag), meaning "if the next instruction to be executed 
is opcode, the current tag of the program counter (PC) is pc tag, 
and the arguments expected by this opcode are tagged argument\ 
tag, etc., then executing the instruction is allowed and, in the new 
state of the machine, the PC should be tagged new pc tag and any 
new data created by the instruction should be tagged result tag." 
(The individual argument-result pairs in this function's graph are 
called rule instances, to distinguish them from the symbolic rules 
used at the software level.) In general, the graph of this function 
in extenso will be huge; so, concretely, the hardware maintains a 
cache of recently-used rule instances. On each instruction dispatch 
(in parallel with the logic implementing the usual behavior of 
the instruction — e.g., addition), the hardware forms an argument 
tuple as described above and looks it up in the rule cache. If the 
lookup is successful, the result tuple includes a new tag for the 
PC and a tag for the result of the instruction (if any); these are 
combined with the ordinary results of instruction execution to yield 
the next machine state. Otherwise, if the lookup is unsuccessful, the 
hardware invokes a cache fault handler — a trusted piece of system 
software with the job of checking whether the faulting combination 
of tags corresponds to a policy violation or whether it should be 
allowed. In the latter case, an appropriate rule instance specifying 
tags for the instruction's results is added to the cache, and the 
faulting instruction is restarted. Thus, the hardware is generic and 
the interpretation of policies (e.g., IFC, memory safety or control 
flow integrity) is programmed in software, with the results cached 
in hardware for common-case efficiency. 

The first contribution of this paper is to explain and formalize, in 
Coq, the key ideas in this design via a simplified model of the SAFE 
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machine, embodying its tagging mechanisms in a distilled form and 
focusing on enforcing IFC using these general mechanisms. In §2, 
we outline the features of the full SAFE system and enumerate the 
most significant simplifications in our model. To streamline the ex- 
position, most of the paper describes a further-simplified version 
of the system, deferring to § 1 1 the discussion of the more sophis- 
ticated memory model and IFC label representation that we have 
actually formalized in Coq. We begin by defining a very simple 
abstract IFC machine with a built-in, purely dynamic IFC enforce- 
ment mechanism and an abstract lattice of IFC labels (§3). We then 
show, in three steps, how this abstract machine can be implemented 
using the low-level mechanisms we propose. The first step intro- 
duces a symbolic IFC rule machine that reorganizes the semantics 
of the abstract machine, splitting out the IFC enforcement mech- 
anism into a separate judgment parameterized by a symbolic IFC 
rule table (§4). The second step defines a generic concrete machine 
(§5) that provides low-level support for efficiently implementing 
many different high-level policies (IFC and others) with a combi- 
nation of a hardware rule cache and a software fault handler. The 
final step instantiates the concrete machine with a concrete fault 
handler enforcing IFC. We do this using an IFC fault handler gen- 
erator (§6), which compiles the symbolic IFC rule table into a se- 
quence of machine instructions implementing the IFC enforcement 
judgment. 

Our second contribution is a machine-checked proof that this 
simplified SAFE system is correct and secure, in the sense that 
user code running on the concrete machine equipped with the IFC 
fault handler behaves the same way as on the abstract machine and 
enjoys the standard noninterference property that "high inputs do 
not influence low outputs." The interplay of the concrete machine 
and fault handler is complex, so some proof abstraction is essen- 
tial. (Previous projects such as the CompCert compiler [48], the 
seL4 microkernel [44, 57], and the RockSalt SFI checker [56] have 
demonstrated the need for significant attention to organization in 
similar proofs.) In our proof architecture, a first abstraction layer 
is based on refinement. This allows us to reason in terms of a high- 
level view of memory, ignoring the concrete implementation of IFC 
labels, while setting up the intricate indistinguishability relation 
used in the noninterference proof. A second layer of abstraction 
is required for reasoning about the correctness of the fault handler. 
Here, we rely on a verified custom Hoare logic that abstracts from 
low-level machine instructions into a reusable set of verified struc- 
tured code generators. 

In §7 we prove that the IFC fault handler generator correctly 
compiles a symbolic IFC rule table and a concrete representation 
of an abstract label lattice into an appropriate sequence of machine 
instructions. We then introduce a standard notion of refinement 
(§8) and show that the concrete machine running the generated IFC 
fault handler refines the abstract IFC machine and vice-versa, us- 
ing the symbolic IFC rule machine as an intermediate refinement 
point in each direction of the proof (§9). In our deterministic set- 
ting, showing refinement in both directions guarantees that the con- 
crete machine does not diverge or get stuck when handling a fault. 
We next introduce a standard termination-insensitive noninterfer- 
ence (TINI) property (§10) and show that it holds for the abstract 
machine. Since deterministic TINI is preserved by refinement, we 
conclude that the concrete machine running the generated IFC fault 
handler also satisfies TINI. Finally, we explain how to accommo- 
date two important features that are handled by our Coq develop- 
ment but elided from the foregoing sections: dynamic memory al- 
location and tags representing sets of principals (§11). We close 
with a survey of related work (§12) and a discussion of future di- 
rections (§13). A Coq script formalizing the entire development is 
available at http : //www. crash- safe . org. 



2. Overview of SAFE 

To establish context, we begin with a brief overview of the full 
SAFE system, concentrating on its OS- and hardware-level fea- 
tures. More detailed descriptions can be found elsewhere [24, 27, 
28, 37,38,47, 54], 

SAFE'S system software performs process scheduling, stream- 
based interprocess communication, storage allocation and garbage 
collection, and management of the low-level tagging hardware (the 
focus of this paper). The goal is to organize these services as a col- 
lection of mutually suspicious compartments following the princi- 
ple of least privilege (a zero-kernel OS [75]), so that an attacker 
would need to compromise multiple compartments to gain com- 
plete control of the machine. It is programmed in a combination 
of assembly and Tempest, a new low-level systems programming 
language. 

The SAFE hardware integrates a number of mechanisms for 
eliminating common vulnerabilities and supporting higher-level se- 
curity primitives. To begin with, SAFE is (dynamically) typed at 
the hardware level: each data word is indelibly marked as a num- 
ber, an instruction, a pointer, etc. Next, the hardware is memory 
safe: every pointer consists of a triple of base, bounds, and offset 
(compactly encoded into 64 bits [28, 47]), and every pointer oper- 
ation includes a hardware bounds check [47]. Finally, the hardware 
associates each word in the registers and memory, as well as the 
PC, with a large (59-bit) tag. The hardware rule cache, enabling 
software-specified propagation of tags from operands to result on 
each machine step, is implemented using a combination of multiple 
hash functions to approximate a fully-associative cache [27], 

An unusual feature of the SAFE design is that formal modeling 
and verification of its core mechanisms have played a central role 
in the design process since the beginning. The long-term goal — 
formally specifying and verifying the entire set of critical runtime 
services — is still some ways in the future, but key properties of 
simplified models have been verified both at the level of Breeze [37] 
(a mostly functional, security-oriented, dynamic language used for 
user-level programming on SAFE) and, in the present work, at 
the hardware and abstract machine level. Experiments are also 
underway to use random testing of properties like noninterference 
as a means to speed the design process [38]. 

Our goal in this paper is to develop a clear, precise, and math- 
ematically tractable model of one of the main innovations in the 
SAFE design: its scheme for efficiently supporting high-level data 
use policies using a combination of hardware and low-level sys- 
tem software. To make the model easy to work with, we simplify 
away many important facets of the real SAFE system. In particular, 
(i) we focus only on IFC and noninterference, although the tag- 
ging facilities of the SAFE machine are generic and can be applied 
to other policies (we return to this point in §13); (ii) we ignore 
the Breeze and Tempest programming languages and concentrate 
on the hardware and runtime services; (iii) we use a stack instead 
of registers, and we distill the instruction set to just a handful of 
opcodes; (iv) we drop SAFE'S fine-grained privilege separation in 
favor of a more conventional user-mode / kernel-mode dichotomy; 
(v) we shrink the rule cache to a single entry (avoiding issues of 
replacement and eviction) and maintain it in kernel memory, ac- 
cessed by ordinary loads and stores, rather than in specialized cache 
hardware; (vi) we omit a large number of IFC-related concepts 
(dynamic principals, downgrading, public labels, integrity, clear- 
ance, etc.); (vii) we handle exceptional conditions, including poten- 
tial security violations, by simply halting the whole machine; and 
(viii) most importantly, we ignore concurrency, process scheduling, 
and interprocess communication, assuming instead that the whole 
machine has a single, deterministic thread of control. The absence 
of concurrency is a particularly significant simplification, given that 
we are talking about an operating system that offers IFC as a ser- 
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instr ::= Basic instruction set 



Add 


addition 


Output 


output top of stack 


Push n 


push integer constant 


Load 


indirect load from data memory 


Store 


indirect store to data memory 


Jump 


unconditional indirect jump 


Bnz n 


conditional relative jump 


Call 


indirect call 


Ret 


return 



Figure 1. Instruction set 



vice. However, we conjecture that it may be possible to add con- 
currency to our formalization, while maintaining a high degree of 
determinism, by adapting the approach used in the proof of nonin- 
terference for the seL4 microkernel [57, 58]. We return to this point 
in §13. 

3. Abstract IFC Machine 

We begin the technical development by defining a very simple stack- 
and-pointer machine with "hard-wired" dynamic IFC. This ma- 
chine concisely embodies the IFC mechanism we want to provide 
to higher-level software and serves as a specification for the sym- 
bolic IFC rule machine (§4) and for the concrete machine (§5) run- 
ning our IFC fault handler (§6). The three machines share a tiny in- 
struction set (Fig. 1) designed to be a convenient target for compil- 
ing the symbolic IFC rule table into machine instructions (the Coq 
development formalizes several other instructions, including Sub, 
Pop, a variant of Call that takes a variable number of arguments 
and a variant of Ret that allows to return a result on the stack). All 
three machines use a fixed instruction memory i, a partial function 
from (non-negative) integer addresses to instructions. 

The machine manipulates integers (ranged over by n, m, and 
p)\ unlike the real SAFE machine, we make no distinction between 
raw integers and pointers (we re-introduce this distinction in §11). 
Each integer is protected by an individual IFC label (ranged over by 
L). We assume an arbitrary set of labels C equipped with a partial 
order (<), a least upper bound operation (V), and a bottom element 
(_L). For instance we might take £ to be the set of levels {_L, T} 
with _L < T and _L V T = T. We call a pair of an integer n and its 
protecting label L an atom, written n&L and ranged over by a. 

An abstract machine state (p [a] pc) consists of a data mem- 
ory p, a stack a, and a program counter pc. (We sometimes drop 
the outer brackets.) The data memory p is a partial function from 
integer addresses to atoms. We write fi(p) a for the memory 
that coincides with n everywhere except at p, where its value is a. 
The stack a is essentially a list of atoms, but we distinguish stacks 
beginning with return addresses (written pc;a) from ones begin- 
ning with regular atoms (written a, a). Formally, stacks are lists 
with two "cons" constructors, written "," and ";". This distinction is 
needed so that stack-manipulating instructions treat frame markers 
specially — so that a program that, for example, pushes an integer 
and attempts to return to it is treated as erroneous by the operational 
semantics. The program counter (PC) pc is an atom whose label is 
used to track implicit flows, as explained below. 

The step relation of the abstract machine, written u h 
pi [<ti] pci —t (j,2 [0-2] PC2, is a partial function taking a machine 
state to a machine state plus an output action a, which can be ei- 
ther an atom or the silent action r. We generally omit 1 from transi- 
tions because it is fixed. Throughout the paper we study other, sim- 
ilar relations, and consistently refer to non-silent actions as events 
(ranged over by e). 



t(n) = Add 

p [mail, n2@Lz, cj\ nmLpc 

p [(ni + n 2 )«s(iiVi 2 ),cr] (n+l)@i pc 

t(n) = Output 

™<8(£l VLp C ) 

p, [moLijCr] n®L pc > p [a] (n+l)<aL pc 

t(n) = Push m 
p [a] nmLpc —± p [mal,(r] (n+l)@L pc 
b{n) = Load f-ip) — tn@L2 
p [p@Li,a] nmLpc p [mm(Li\/ L2) , a] (n+l)mL pc 

i(n) — Store jtt(p) = kmLi L\\lL pc < L :i 
p{p) <— (maZ/iVZ/2VL pc ) = p 

p [poZ/i, m@L2, a] n@L pc —¥ p' [a] {n+l)@L pc 

i(n) — Jump 

p [n'<aLi,a] n®L pc —¥ p [a] n'<a(LiV L pc ) 

i(n) = Bnzk n' = n+((m = 0)?1 : k) 

p [m@£i, a] nmL pc A p [a] n'a(£iVL pc ) 

t(n) = Call 

p [n'@Li, a, a] n@L pc p [a, (n+l)@L pc ; a] n'o(Li \ZL pc ) 
u(n) = Ret 
p [n'@Li;a] nmL pc —± p [a] n'mLi 

Figure 2. Semantics of IFC abstract machine 

The stepping rules in Fig. 2 adapt a standard purely dynamic 
IFC enforcement mechanism [4, 67] to a low-level machine, fol- 
lowing recent work by Hrifcu et al. [38]. The rule for Add joins 
(V) the labels of the two operands to produce the label of the re- 
sult, which ensures that the result is at least as classified as each of 
the operands. (For example, suppose t = [..., Add, ...] and n is the 
index of this Add instruction. Then 

p [7®_L,5qT] n@_L ^> p [12oT] (n+l)aJ_.) 

The rule for Push labels the integer constant added to the stack 
as public (_L). The rule for Jump uses join to raise the label of 
the PC by the label of the integer that serves as the target address 
of the jump. Similarly, Bnz raises the label of the PC by the 
label of the tested integer. In both cases the value of the PC after 
the instruction depends on data that could be secret, and we use 
the label of the PC to track the label of data that has influenced 
control flow. In order to prevent implicit flows (leaks exploiting the 
control flow of the program), the Store rule joins the PC label with 
the original label of the written integer and with the label of the 
pointer through which the write happens. Additionally, since the 
labels of memory locations are allowed to vary during execution, 
we prevent leaking information via labels using a "no-sensitive- 
upgrade" check [4, 86] (the < precondition in the rule for Store). 1 
This check prevents memory locations labeled public from being 
overwritten when either the PC or the pointer through which the 
store happens have been influenced by secrets. The Output rule 
labels the emitted integer with the join of its original label and the 



1 More recent work further improves precision compared to the no- 
sensitive-upgrades policy [5, 36]. We went with no-sensitive-upgrades in 
this work because it is simpler and requires less bookkeeping. 
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current PC label. 2 Finally, because of the structured control flow 
imposed by the stack discipline, the rule for Ret can soundly restore 
the PC label to whatever it was at the time of the Call. (Readers less 
familiar with the intricacies of dynamic IFC may find some of these 
side conditions a bit mysterious. A longer explanation can be found 
in [38], but the details are not critical for present purposes.) 

All data in the machine's initial state are labelled (as in all ma- 
chine states), and the simple machine manages labels to ensure non- 
interference as defined and proved in § 10. There are no instructions 
that explicitly raise the label (classification) of an atom. Such an 
instruction, joinP, is added to the machine in §11. 

We prove noninterference for this machine in §10. 

4. Symbolic IFC Rule Machine 

In the abstract machine described above, IFC is tightly integrated 
into the step relation in the form of side conditions on each in- 
struction. In contrast, the concrete machine (i.e., the "hardware") 
described in §5 is generic, designed to support a wide range of 
software-defined policies (IFC and other). The machine introduced 
in this section serves as a bridge between these two models. It is 
closer to the abstract machine — indeed, its machine states and the 
behavior of the step relation are identical. The important difference 
lies in the definition of the step relation, where all the IFC-related 
aspects are factored out into a separate judgment. We can think of 
the IFC mechanism as being implemented in a separate "IFC rule 
processor" distinct from the main "CPU." In the concrete machine, 
the CPU part will remain unchanged, but the IFC rule processor 
will be implemented mostly in software (by the fault handler), with 
the hardware only providing caching of rule instances. While fac- 
toring out IFC enforcement into a separate reference monitor [72] 
is commonplace [2, 67, 70], our approach goes further. We define 
a small DSL for describing symbolic IFC rules and obtain actual 
monitors by interpreting this DSL (in this section) and by com- 
piling it into machine instructions using verified structured code 
generators (in §6 and §7). 

More formally, each stepping rule of the new machine(see Fig. 3) 
includes a uniform call to an IFC enforcement relation, which itself 
is parameterized by a symbolic IFC rule table TZ. Given the labels 
of the values relevant to an instruction, the IFC enforcement rela- 
tion (i) checks whether the execution of that instruction is allowed 
in the current configuration, and (ii) if so, yields the labels to put 
on the resulting PC and on any resulting value. This judgment has 
the form I— ^ (L po ,£i, £2^3) ^opcode L rpc , L r , where the 4-tuple 
on the left-hand side represents the input PC label and three addi- 
tional input labels (more precisely, optional labels, as the number 
of relevant labels depends on the opcode but the tuple is of fixed 
size), and L rpc and L r are the resulting output labels (of which the 
second might be ignored). 

Let us illustrate, for a few cases, how this new judgment is used 
in the stepping relation (Fig. 3). The stepping rule for Add passes 
three inputs to the IFC enforcement judgment: L pc , the label of the 
current PC, and L\ and L2, the labels of the two operands at the top 
of the stack. (The fourth element of the input tuple is written as _ 
because it is not needed for Add.) The IFC enforcement judgment 
produces two labels: L rpc is used to label the next program counter 
(n + 1) and L r is used to label the result value. All the other 
stepping rules follow a similar scheme. (The one for Store uses 
all four input labels. In this stepping rule the resulting label L r is 
used to label the new value m to be stored at location p.) 



2 We assume the observer of the events generated by Output is constrained 
by the rules of information flow — i.e., cannot freely "look inside" bare 
events. In the real SAFE machine, atoms being sent to the outside world 
need to be protected cryptographically; we are abstracting this away. 
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Figure 3. Semantics of symbolic rule machine, parameterized by 
7^ 



opcode 


allow 


Crpc 


e r 


add 


TRUE 


L ABpc 


LABi U LAB 2 


output 


TRUE 


L ABpc 


LABi U LABp C 


push 


TRUE 




BOT 


load 


TRUE 


LABpc 


LABi U LAB 2 


store 


LABjULABpc C LAB; 


LABp C 


LABi U LAB 2 U LAB pc 


jump 


TRUE 


LABi U LABp C 




bnz 


TRUE 


LABi U LABp C 




call 


TRUE 


LABi U LABp C 


LABp C 


ret 


TRUE 


LABi 




Figure 4. Rule table 7e abs 


corresponding to abstract IFC machine 



A symbolic IFC rule table TZ describes a particular IFC enforce- 
ment mechanism. For instance, the rule table TZ 3bs corresponding to 
the IFC mechanism of the abstract machine is shown in Fig. 4. In 
general, a table TZ associates a symbolic IFC rule to each instruc- 
tion opcode (formally, TZ is a total function). Each of these rules is 
formed of three symbolic expressions: (i) a boolean expression in- 
dicating whether the execution of the instruction is allowed or not 
(i.e., whether it violates the IFC enforcement mechanism); (ii) a 
label- valued expression for L r p C , the label of the next PC; and (iii) a 
label-valued expression for L r , the label of the result value, if there 
is one. In cases where L r is not used by the corresponding opcode, 
we write to mean "don't care," which is a synonym for BOT (the 
symbolic representation of the _L label). 

These symbolic expressions are written in a simple domain- 
specific language (DSL) of operations over an IFC lattice. 
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LE, 6r, e r 



BE, allow 



BOT 
LABi 
LAB 2 
LAB 3 
LABpc 

LE X U LE 2 
(LE) 



TRUE 

LE! C iE 2 
(BE) S 



Figure 5. Symbolic IFC rule language syntax 



p h LE ± 4, ii p h L£ 2 4- L 2 ii < ^2 
p h TRUE p h LSi C L_B 2 

p h LEi 4. Li p h L£ 2 4, L 2 



p h BOT 4- -L pi- (i-Bl U L£ 2 ) 4, (ii VL 2 ) 



(n) = Add 



k n p [ni<a_, n 2 a_, cr] jj<a_ 
k k (i [(ni+n 2 )aT D , cr] n+loT D 



(n) — Push m 



k /i p; [cr] n@_ kd(i [maTp, cr] n+laTn 
4>(n) = Load = m&T\ 

k k p [pm_,a] nm_ k k p [moTi,cr] n+l©T D 

4>{n) — Store store ft p (moTi) = k' 
k k p [po_> TnaTi, cr] n<a_ A kc' fi [a] n+l@T D 
4>(n) — Jump 
k k (i [rt'@_, cr] n&_ k k /j [a] n'eTo 

co(n) = Bnzifc n' = n+((m = 0)?1 : k) 
k n p [m®_, a] n@_ k k p [a] n'@T D 
c>(n) = Call 

k k (i a, cr] n@_ k k p; [a, (jj+lsiTD, k); cr] n'eTo 

c/>(n) = Ret 
k k p [(n'eTi , 7r) ; cr] n@_ ^> 7T «: pi [cr] n'oTi 



(L pc ,e 1 ,e 2 ,e3) i- lab pc 4 l pc {L pc ,e 1 ,L 2 ,i3) \- lab 2 4 l 2 



(L pc , Li ,£ 2 , £3 ) h LABi 4. Li (ipc, li , £ 2 , ^3) r- LAB 3 4. L 3 
Figure 6. Symbolic IFC rule language semantics 

The grammar of this DSL (Fig. 5) includes label vari- 
ables LABp C , . . . , LAB3, which correspond to the input labels 
L pc , . . . , £3; the constant BOT; and the lattice operators U (join) 
and C (flows). 

The IFC enforcement judgment looks up the corresponding 
symbolic IFC rule in the table and directly evaluates the symbolic 
expressions in terms of the corresponding lattice operations. In 
contrast, in §6 we compile this rule table into the IFC fault handler 
for the concrete machine. Formally, the IFC enforcement judgment 
is defined by 

Rule-iz(opcode) — (allow, e rpc ,e r ) 

p h allow p h e rpc 4- L rpc p h e r 4- L r 

r~7^ P o P code L rpc , L r 

Ruleiz(opcode) = (allow , e rpc , — ) 
p h allow p h e rpc 4- irpc 

I - P' N ^ opcode L rpc , 

where p is a 4-tuple of labels, Rulen looks up the relevant expres- 
sions in rule table 1Z, and expression evaluation is defined in Fig. 6. 

5. Concrete Machine 

The concrete machine provides low-level support for efficiently im- 
plementing many different high-level policies (IFC and others) with 
a combination of a hardware rule cache and a software cache fault 
handler. In this section we focus on the concrete machine's hard- 
ware, which is completely generic, while in §6 we describe a spe- 
cific fault handler corresponding to the IFC rules of the symbolic 
rule machine. 

The concrete machine has the same general structure as the 
more abstract ones, but differs in several important respects. One is 
that it annotates data values with integer tags T, rather than with la- 
bels L from an abstract lattice; thus, the concrete atoms a in the data 



Figure 7. Concrete step relation (kernel mode) 

memories and the stack have the form n@T. Similarly, a concrete 
action a is either a concrete atom or the silent action r. We con- 
sistently use the word label and variable L to refer to the (abstract, 
lattice-structured) labels of the abstract and symbolic rule machines 
and the word tag and variable T for concrete integers representing 
labels. Using plain integers as tags allows us to delegate their in- 
terpretation entirely to software. In this paper we focus solely on 
using tags to implement IFC labels, although they could also be 
used for enforcing other policies, such as type and memory safety 
or control-flow integrity. For instance, to implement the two-point 
abstract lattice with _L < T, we could use 0 to represent _L and 1 to 
represent T, making the operations V and < easy to implement (see 
§6). For richer abstract lattices, a more complex concrete represen- 
tation might be needed; for example, a label containing an arbitrary 
set of principals might be represented concretely by a pointer to 
an array data structure (see §11). In places where a tag is needed 
but its value is irrelevant, the concrete machine uses a specific but 
arbitrary default tag value (e.g., -1), which we write T D . 

A second important difference is that the concrete machine has 
two modes: user mode (u), for executing the ordinary user program, 
and kernel mode (k), for handling rule cache faults. To support these 
two modes, the concrete machine's state contains a privilege bit 
7r, a separate kernel instruction memory <j>, and a separate kernel 
data memory k, in addition to the user instruction memory 1, the 
user data memory fi, the stack cr, and the PC. When the machine is 
operating in user mode (ir = u), instructions are looked up using 
the PC as an index into t, and loads and stores use p; when in 
kernel mode (n = k), the PC is treated as an index into 0, and 
loads and stores use k. The concrete step relation has the form 
1, <fi h 7Ti Ki pi [cri] pc x 7r 2 K2 p 2 [cr 2 ] pc 2 - As before, since 
l and 4> are fixed, we normally leave them implicit when writing 
down machine transitions. 

The concrete machine has the same instruction set as the previ- 
ous ones, allowing user programs to be run on all three machines 
unchanged. But the tag-related semantics of instructions depends 
on the privilege mode, and in user mode the semantics further de- 
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t(n) = Add 



= I add I T„ c I Ti I T 2 I T D 



i(n) = Add 



Ki 7^ add I T pc I Ti I T2 I T D I = «j 



u ft fx [maTi, n2®T2, cr] n©T pc 
u k fx [(ni+n 2 )@T r , cr] n+l!aT rpc 

t(n) = Output 

K = I output I T pc I Ti I To I To 1 1 T rp c I Tr 



u ft fx [maTijO"] naT pc 
u ft (i [cr] n+loT rpc 

t(n) = Push m 

ft = I push I 7p C I T D I T D I T D 1 1 T rpc | T r 



u ft [cr] n©Tj, c 
t(n) 

K = load I T. 



u ft fx 

Load 

m<aT 2 



., cr] n+l<nT r 



Ti I T 2 I T D 1 1 T 



rpc 



u k fx [poTi , <t] n©T pc 

U K fx [TO<aT r ,cr] n+l@T r pc 

t(n) = Store = k@Ts 



k = store I Tpc I Ti I T 2 I T 3 1 1 T r 



u ft fx [n'@Ti,a] maTpc 
u k fx [cr] n'©T r p C 

i(n) = Bnz fc 

ft = bnz I Tpc I Ti I Tp I Tp 1 1 T rpc | Tp 
n' = n+((m = 0)?1 : k) 

u ft (i [moTi,cr] maTpc — > 



u ft fx 
i(n) — Call 



[a] n'@T r 



ft = I call I T„c I Ti I T D I Tp 



u k fx [n'©Ti,a,cr] n@Tp C 

u ft fx [a, (n+l<QT r , u); cr] n @T rpc 



t(n) = Ret 

ft = I ret I Tpc I Ti I Tp I Tp 1 1 T rpe | Tp | 

u k fx [(n'eTi, u); a] n@T pc 
u k fx [cr] n @T r p C 



u [k»,k 0 ] /i [nioTi, n 2 ©T 2 ,cr] nioTpc 

k [ftj, ftp] fi [(n®Tp C , u); ni@Ti, n 2 @T 2 , cr] 0<aT D 



t(n) = Output 



Ki ^ output I Tpc I Ti I T D I T D I 



[maTi.oj n@T p 



k [ftj, ftp] fx [(maTpc, u); m@Ti, cr] 0@T D 
t(n) = Push m 

Ki / push I Tpc I T D I T D I T D I = ftj 

u [fti,ft 0 ] fx [a] naTpc k [ftj, ftp] fx [(noTp C , u); a] 0®Tp 

t(n) = Load 

= toqT 2 
Kj 7^ load I Tpc I Ti I T 2 I Tp = Kj 

u [fti,ft 0 ] /1 [p@Ti,cr] neTpc —t 

k [ftj, ftp] fx [(noTpc, u); poTi, cr] 0@T D 

i[n) = Store = k&T^, 



Kp) <- 


- (m@T r ) = // 




Ki / store Tpc 


| Tl | T2 | T3 | = Kj 


U ft fX 


[poTi, m@T 2 . cr] 


T 

n@T pc — > 


U [fvj,fv 0 ] fX [ 


paTi , m@T 2 , cr] naT pc 


U K fX 


w 


n+l<aT rp c 


k [ftj, ftp] fx [(neTpe, u); 


poTi, m©T 2 , cr] 0<bT d 


i(n 


) = Jump 




t(n) = Jump 




ft = 


: jump Tpc Ti 


Tp Tp T r p C | Tp | 


Ki / jump Tpc 


Ti | Tp | Tp | — ftj 



U ]Ki,K 0 ] fX 
k [ftj , ftp] fx 



[n'oTijCf] naTpc 
pc u); n'oTi, cr] 0@T D 



b(n) — Bnz k 



Ki 7^ bnz I Tpc I Ti I T D I T D I = ftj 

u [fti,ft 0 ] fx [moTi,cr] neTpc 

k [ftj, ftp] ft [(naTpc, u); maTi, a] 0<s>T D 

t(n) = Call 



ftj / I call I Tpc I Ti I T D I fp~| = ftj 



u [fti,ft 0 ] fx [n'@Ti,a,a] n<8T p , 

k [ftj, ftp] fx [(nmTpc, u); n'eTi, a, cr] 0@T D 



Ret 



<-(n) 

Ki / ret I Tpc I Ti I T D I Tp~| 



u [fti,ft 0 ] fx [(n'oTi, 7r); a] n@T pc 

k [ftj, ftp] fx [(n@T P c, u); (n'eTi, 7r); cr] 0@T D 



Figure 8. Concrete step relation: user mode (normal / faulting) 



pends on the state of the rule cache. In the real SAFE machine, the 
rule cache may contain thousands of entries and is implemented 
as a separate near-associative memory [27] accessed by special in- 
structions. Here, for simplicity, we use a cache with just one entry, 
located at the start of kernel memory, and use Load and Store in- 
structions to manipulate it; indeed, until we get to the extensions in 
§ 1 1, it constitutes the entirety of ft. 

The rule cache holds a single rule instance, represented graph- 
ically like this: | opcode | T pe | Ti | T 2 | T3 1 1 T r p C | T r ] . Location 0 
holds an integer representing an opcode. (Since the exact choice 
of representation doesn't matter, we will denote each opcode with 
a lowercase identifier — for example, we might define add = 0, 
output = 1, etc.) Location 1 holds the PC tag. Locations 2 to 4 



hold the tags of any other arguments needed by this particular op- 
code. Location 5 holds the tag that should go on the PC after this in- 
struction executes, and location 6 holds the tag for the instruction's 
result value, if needed. For example, suppose the cache contains 
this: 

[add I 0 I 1 I 1 I -1 |f0jT] 
(Note that we are showing just the "payload" part of these 
seven atoms; by convention, the tag part is always T D , and we 
do not display it.) This one-line rule cache should be thought 
of as implementing a (very) partial function: when the input is 
I add I 0 I 1 1 1 I -1 | , the output is (0, 1); otherwise it is undefined. 
If 0 is the tag representing the label _L, 1 represents T, and -1 is 
the default tag T D , this can be interpreted abstractly as follows: "If 
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the next instruction is Add, the PC is labeled _L, and the two rele- 
vant arguments are both labeled T, then the instruction should be 
allowed, the label on the new PC should be _L, and the label on the 
result of the operation is T." 

There are two sets of stepping rules governing the behavior of 
the concrete machine in user mode; which set applies depends on 
whether the current machine state matches the current contents 
of the rule cache. In the "cache hit" case the instruction executes 
normally, with the cache's output determining the new PC tag and 
result tag (if any). In the "cache miss" case, the relevant parts of 
the current state (opcode, PC tag, argument tags) are stored into the 
input part of the single cache line and the machine simulates a Call 
to the fault handler. 

To see how this works in more detail, consider the two user- 
mode stepping rules for the Add instruction. 



t(n) = Add 



| add | T pe | Ti | T2 | Tp 1 1 T rpc | TV 



u « fj, [moTi, n2©T2, ff] n®1p C 
u ft fi [(m + n2)®TV, <j] n+l®T rpc 



i(n) = Add 
1 j" 



7^ I add I T pc I Ti I T 2 



T D I = ttj 
nml v 



%i,Ka] A* [niQTi, 712QT2, 

*j,KD] A* [(«® T yc, u); nioTi, n 2 oT 2 , a] OaTp 
In the first rule (cache hit), the side condition demands that the input 



part of the current cache contents have the form add | T pe | Ti | T 2 | Tp | 



where T pc is the tag on the current PC, Ti and T2 are the tags on 
the top two atoms on the stack, and the fourth element is the default 
tag. In this case, the output part of the rule, | T rpe | T r ~| , determines 
the tag T rpc on the PC and the tag T r on the new atom pushed onto 
the stack in the next machine state. 

In the second rule (cache miss), the notation [m, ft D ] means "let 
Ki be the input part of the current rule cache and ft 0 be the output 
part." The side condition says that the current input part m does 
not have the desired form add | T pc | Ti | T2 | Tp | , so the machine 
needs to enter the fault handler. The next machine state is formed 
as follows: (i) the input part of the cache is set to the desired form 
Kj and the output part is set to ftp = | T D | T D | ; (ii) a new return 
frame is pushed on top of the stack to remember the current PC and 
privilege bit (u); (iii) the privilege bit is set to k (which will cause 
the next instruction to be read from the kernel instruction memory); 
and (iv) the PC is set to 0, the location in the kernel instruction 
memory where the fault handler routine begins. 

What happens next is up to the fault handler code. Its job is 
to examine the contents of the first five kernel memory locations 
and either (i) write appropriate tags for the result and new PC into 
the sixth and seventh kernel memory locations and then perform a 
Ret to go back to user mode and restart the faulting instruction, or 
(ii) stop the machine by jumping to an invalid PC (-1) to signal that 
the attempted combination of opcode and argument tags is illegal. 3 
This mechanism is general and can be used to implement many 
different high-level policies (IFC and others). 

In kernel mode (Fig. 7), the treatment of tags is almost com- 
pletely degenerate: to avoid infinite regress, the concrete machine 
does not consult the rule cache while in kernel mode. For most 
instructions, tags read from the current machine state are ignored 
(indicated by _) and tags written to the new state are set to T D . This 
can be seen for instance in the kernel-mode step rule for addition 

tj>(n) = Add 



ft fi 

ft jJL 



[ni<a_, n 2 a_, a\ 
[(ni+na)aT D , a] 



n 
n+1 



sip 



3 As explained in §2, in this work we assume for simplicity that policy 
violations are fatal. Recent work [37] has shown that it is possible to recover 
from IFC violations while preserving noninterference. 



The only significant exceptions to this pattern are Load and Store, 
which preserve the tag of the datum being read from or written to 
memory, and Ret, which takes both the privilege bit and the new 
PC (including its tag!) from the return frame at the top of the stack. 
This is critical, since a Ret instruction is used to return from kernel 
to user mode when the fault handler has finished executing. 

4>{n) = Ret 
k k ji [(n'mTi, 7r); a] nm_ tt ft /j, [a] n'oTi 

A final point is that Output is not permitted in kernel mode, 
which guarantees that kernel actions are always the silent action r. 

As an illustration of how all this works, suppose again that 
1 — [..., Add, ...], and that the concrete integer tag 0 represents 
the abstract label _L, 1 represents T, and -1 is Tp. Then: 



u I add j 0 I 0 1 1 1 -1 1 1 0 j 1 1 n 
u I add I 0 I 0 1 1 1 -1 1 1 0 1 1 1 fi 



[7a0, 5al] naO 
[12al] (n+l)a0 



On the other hand, if the tags on both operands are 1 (i.e., T), then 
the first step will miss in the cache and reduction will proceed as 
follows: 



u I add j 0 I 0 I 1 I -1 1 1 0 j 1 I 
k I add j 0 I 1 I 1 I -1 j I -1 I -1 
...fault handler runs ... 
k I add j 0 I 1 I 1 I -1 1 1 0 j 1 1 
u j add j 0 j 1 j 1 I -1 j I 0 j 1 j 
u I add I 0 I 0 I 1 I -1 1 1 0 I 1 I 



[7al,5el] 

, u); 7ol, 5all 



/i [(not), u); 7al, 5al] 
fj, [7el, 5el] 
H [12ol] 



naO 



fea-1 

naO 
(n+1) 



6. Fault Handler for IFC 

Now we assemble the pieces. A concrete IFC machine implement- 
ing the symbolic rule machine defined in §4 can be obtained by 
installing appropriate fault handler code in the kernel instruction 
memory of the concrete machine presented in §5. In essence, this 
handler must emulate how the symbolic rule machine looks up 
and evaluates the DSL expressions in a given IFC rule table. We 
choose to generate the handler code by compiling the lookup and 
DSL evaluation relations directly into machine code. (An alterna- 
tive would be to represent the rule table as abstract syntax in the 
kernel memory and write an interpreter in machine code for the 
DSL, but the compilation approach seems to lead to simpler code 
and proofs.) 

The handler compilation scheme is given in Fig. 9. Each gen* 
function generates a list of concrete machine instructions; the se- 
quence generated by the top-level genFaultHandler is intended to 
be installed starting at location 0 in the concrete machine's kernel 
instruction memory. The implicit addr* parameters are symbolic 
names for the locations of the opcode and various tags in the con- 
crete machine's rule cache, as described in §5. The entire generator 
is parameterized by an arbitrary rule table 1Z. We make heavy use 
of the (obvious) encoding of booleans where false is represented by 
0 and true by any non-zero value. 

The top-level handler works in three phases. The first phase, 
genComputeResults, does most of the work: it consists of a large 
nested if-then-else chain, built using genlndexedCases, that com- 
pares the opcode of the faulting instruction against each possible 
opcode and, on a match, executes the code generated for the corre- 
sponding symbolic IFC rule. The code generated for each symbolic 
IFC rule (by genApplyRule) pushes its results onto the stack: a flag 
indicating whether the instruction is allowed and, if so, the result- 
PC and result- value tags. This first phase never writes to memory 
or transfers control outside the handler; this makes it fairly easy to 
prove correct. 
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The second phase of the top-level handler, genStoreResults, 
reads the computed results off the stack and updates the rule cache 
appropriately. If the result indicates that the instruction is allowed, 
the result PC and value tags are written to the cache, and true is 
pushed on the stack; otherwise, nothing is written to the cache, and 
false is pushed on the stack. 

The third and final phase of the top-level handler tests the 
boolean just pushed onto the stack and either returns to user code 
(instruction is allowed) or jumps to address -1 (disallowed). 

The code for symbolic rule compilation is built by straightfor- 
ward recursive traversal of the rule DSL syntax for label-valued 
expressions (genELab) and boolean-valued expressions (genBool). 
These functions are (implicitly) parameterized by the definitions of 
lattice-specific generators genBot, genjoin, and genFlows. To im- 
plement these generators for a particular lattice, we first need to 
choose how to represent abstract labels as integer tags, and then de- 
termine a sequence of instructions that encodes each operation. We 
call such an encoding scheme a concrete lattice. For example, the 
abstract labels in the two-point lattice can be encoded like booleans, 
representing _L by 0, T by non-0, and instantiating genBot, gen Join, 
and genFlows with code for computing false, disjunction, and im- 
plication, respectively. A simple concrete lattice like this can be for- 
malized as a tuple CL = (Tag, Lab, genBot, genjoin, genFlows), 
where the encoding and decoding functions Lab and Tag satisfy 
Lab o Tag = id; to streamline the exposition, we assume this form 
of concrete lattice for most of the paper. The more realistic encod- 
ing in § 1 1 will require a more complex treatment. 

To raise the level of abstraction of the handler code, we make 
heavy use of structured code generators; this makes it easier both 
to understand the code and to prove it correct using a custom 
Hoare logic that follows the structure of the generators (see §7). For 
example, the gen If function takes two code sequences, representing 
the "then" and "else" branches of a conditional, and generates 
code to test the top of the stack and dispatch control appropriately. 
The higher-order generator genlndexedCases takes a list of integer 
indices (e.g., opcodes) and functions for generating guards and 
branch bodies from an index, and generates code that will run the 
guards in order until one of them computes true, at which point the 
corresponding branch body is run. 



7. Correctness of the Fault Handler Generator 

We now turn our attention to verification, beginning with the fault 
handler. We must show that the generated fault handler emulates the 
IFC enforcement judgment \~n (L pc , £\ , £2, £3) opcode L rpc ,L r 
of the symbolic rule machine. The statement and proof of correct- 
ness are parametric over the symbolic IFC rule table TZ and con- 
crete lattice, and hence over correctness lemmas for the lattice op- 
erations. 



Correctness statement Let TZ be an arbitrary rule table and (j>n = 
genFaultHandler TZ be the corresponding generated fault handler. 
We specify how (f>iz behaves as a whole — as a relation between 
initial state on entry and final state on completion — using the rela- 
tion if) h csi — >^ CS2, defined as the reflexive transitive closure of 
the concrete step relation, with the constraints that the fault handler 
code is (f> and all intermediate states (i.e., strictly preceding CS2) 
have privilege bit k. 

The correctness statement is captured by the following two 
lemmas. Intuitively, if the symbolic IFC enforcement judgment 
allows some given user instruction, then executing 4>iz (stored 
at kernel mode location 0) updates the cache to contain the tag 
encoding of the appropriate result labels and returns to user-mode; 
otherwise, <j>iz halts the machine (pc = -1). 



genFaultHandler TZ = genComputeResults TZ ++ 
genStoreResults ++ 
genlf [Ret] [Push (-1); Jump] 

genComputeResults TZ = 
genlndexedCases [] genMatchOp (genApplyRule o Ruhfi) opcodes 

genMatchOp op = 

[Push op] ++ genLoadFrom addrOpLabel +4- genEqual 
genEqual = [Sub] ++ genNot 

genApplyRule (allow, e rpc , e r ) = genBool allow ++ 
genlf (genSome (genELab e r p C ++ genELab e r )) genNone 

genELab BOT = genBot 

LAB; = genLoadFrom addrTag 4 

LEi U LE2 = genELab LE2 ++ genELab LE\ ++ genjoin 

genBool TRUE = genTrue 

LE\ jZ LE2 = genELab LE2 ++ genELab LE\ ++ genFlows 

genStoreResults = 
genlf (genStoreAt addrTag,. ++ genStoreAt addrTag rpc ++ genTrue) 
gen False 

genFalse = [PushO] 
genTrue = [Push 1] 

genAnd = genlf [] (gen Pop ++ genFalse) 
genOr = genlf (genPop ++ genTrue) [ ] 

genNot = genlf genFalse genTrue 
genlmpl = genNot ++ genOr 
genSome c = C++ genTrue 
genNone = genFalse 

genlndexedCases genDefault genGuard genBody = g 
where g nil = genDefault 

g (n :: ns) = genGuard n ++ genlf {genBody n) (g ns) 

genlf tf = genSkiplf (length /')++/'++ t 

where /' = /++ genSkip(length t) 

genSkip n = genTrue ++ genSkiplf n 

genSkiplf n = [Bnz(n+1)] 

genStoreAt p = [Push p\ Store] 

genLoadFrom p = [Push p; Load] 

genPop = [Bnzl] 

opcodes = add :: output ::...:: ret :: nil 

Figure 9. Generation of fault handler from IFC rule table. 



Lemma 7.1 (Fault handler correctness, allowed case). Suppose 

thatr-R, (L pc ,^i,^2, £3) -^opcode L rpc ,L r and 

Ki = I opcode I Tag(L pe ) | Tag(li) | Tag(l 2 ) | Tag(l^J] . 



Thenar- (k [m,K 0 ] n [(pc, u);a] 0@T D ) 
(u [Ki,K' 0 ] h [a] pc) 
with output cache k! 0 — (Tag (L rpc ), Tag (L r )) . 

Lemma 7.2 (Fault handler correctness, disallowed case). Suppose 
thath-R (L pc , £-1, £ 2 , £3) ~h opcode, and 

Ki = I opcode I Tag(L pe ) | Tag(li) | Tag(l 2 ) | Tag(l^J] . 

Then, for some final stack a' , 

(f)Tz\- (k [Ki,n 0 ] n [(pc, u);a] 0@T D ) -*£ 
(k [Ki,K 0 ] fi [a] -l<aT D ). 

Proof methodology The fault handler is simple enough that a to- 
tal structured language, with a few local control flow primitives, 
global memory, and stack, but without subroutines or local vari- 
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ables, is enough. The fault handler is compiled by composing gen- 
erators (Fig. 9); accordingly, the proofs of these two lemmas reduce 
to correctness proofs for the generators. We employ a custom Hoare 
logic for specifying the generators themselves, which makes the 
code generation proof simple, reusable, and scalable. This is where 
defining a DSL for IFC rules and a structured compiler proves to 
be very useful approach, e.g., compared to symbolic interpretation 
of hand-written code. 

Our logic comprises two notions of Hoare triple. The generated 
code mostly consists of self-contained instruction sequences that 
terminate by "falling off the end" — i.e., that never return or jump 
outside themselves, although they may contain internal jumps (e.g., 
to implement conditionals). The only exception is the final step of 
the handler (third line of genFaultHandler in Fig. 9). We therefore 
define a standard Hoare triple {P} c {Q}, suitable for reasoning 
about self-contained code, and use it for the bulk of the proof. 
To specify the final handler step, we define a non-standard triple 
{P} c {Q}° c for reasoning about escaping code. 

Self-contained-code Hoare triples The triple {P} c { Q}, where 
P and Q are predicates on K x cr, says that, if the kernel instruction 
memory cA contains the code sequence c starting at the current PC, 
and if the current memory and stack satisfy P, then the machine 
will run (in kernel mode) until the PC points to the instruction 
immediately following the sequence c, with a resulting memory 
and stack satisfying Q. In symbols: 

{P}c{Q}± 

c = 4>(n), <j>(n' - 1) A P(k, a) => 
3 k' cr'. Q(k' , cr') 

A (f> h (k k fx [cr] niaT D ) 



(k ft' (j, [a] n'aT D ) 



Note that the instruction memory cA is unconstrained outside of c, so 
if c is not self-contained, no triple about it will be provable; thus, 
these triples obey the usual composition laws. Also, because the 
concrete machine is deterministic, these triples express total, rather 
than partial, correctness, which is essential for proving termination 
in lemmas 7.1 and 7.2. To aid automation of proofs about code 
sequences, we give triples in weakest-precondition style. 



V/tcr. P («, cr) 
V/tcr. <2(k, cr) = 
{P}c{Q} 



P(k, a) 
Q'(K,a) 



{P} [] {P} 

{Pi} ci {P 2 } 



{P'}c{Q'} 

{P 2 } C2 {P 3 } 



{Pi} c 1+ +c 2 {P 3 } 
We build proofs by composing atomic specifications of individ- 
ual instructions, such as 

P(ft, cr) := 3 n\ Ti r\a T2 a' . a — niaTi, ri2<aT2, cr' 
A Q(k, {{n 1 +n 2 )@T D ,a')) 

{P} [Md]{Q} ' 

with specifications for structured code generators, such as 

P(k,<j) := 3nTff'. a = n®T,cr' A (n ^ 0 => Pi(«,<t')) 

A(n = 0 P 2 (k,<t')) 
{Pi} Cl {Q} {P 2 }c 2 {Q} 

{P}genlfcic 2 {Q} 

(We emphasize that all such specifications are verified, not axioma- 
tized as the inference rule notation might suggest.) We also prove a 
specification for the specialized case statement genlndexedCases. 
Although this specification is quite complex when written in full 
detail (and is thus omitted here), it is intuitively simple: given a list 
of indices and functions for generating guards and branches from 
the indices, genlndexedCases will run the guards in order until one 



of them computes true (more precisely, its integer encoding 1), at 
which point the corresponding branch is run. 

The concrete implementations of the lattice operations are also 
specified using triples in this style. 

P{n,cr) := Q(k, (Tag (_L)<aTp, cr)) 
{P}genBot{<2} 

P(k, a) := 3 L L' a'. a = Tag (L)oT D , Tag (Z/)@T D , a' 
A Q(k, Tag (LVL')oTd, ct') 
{P} genJoin{<2} 

P(k, <t) := 3 L U a. a = Tag (i)oT D , Tag {L')@7 B , a' 

A Q(k, (if L < L' then 1 else 0)@T D , cr') 

{P} gen Flows {Q} 

For the two-point lattice, it is easy to prove that the implemented 
operators satisfy these specifications; §11 describes an analogous 
result for a lattice of sets of principals. 

Going a bit further towards bridging the gap between the sym- 
bolic rule and concrete machines, we prove specifications for the 
generation of label expressions 

p h LE I oL 

P(k, a) := K = kq A a = ao 
Q(k, a) := k = kq A a = Tag (oL)@7 0 , cro 
{P} genELabiS {Q} 

and for the code generated to implement the application of a sym- 
bolic IFC symbolic rule. For instance, the case where the the in- 
struction is allowed is described by the specification: 

Escaping-code Hoare triples To be able to specify the entire code 
of the generated fault handler, we also define a second form of 
triple, {P} c {Q}° c , which specifies mostly self-contained, total 
code c that either makes exactly one jump outside of c or returns 
out of kernel mode. This non-locality is needed because the fault 
handler checks whether an information flow violation is about to 
occur, and returns to the user-mode caller if not (Success), or 
jumps to an invalid address (Failure) otherwise. More precisely, 
if P and Q are predicates on k x a and O is a function from 
k x a to outcomes (the constants Success and Failure), then 
{P} c {<3}^ holds if, whenever the kernel instruction memory (f> 
contains the sequence c starting at the current PC, the current cache 
and stack satisfy P, and 

• if O computes Success then the machine runs (in kernel mode) 
until it returns to user code at pc, and Q is satisfied. 

• if O computes Failure then the machine runs (in kernel mode) 
until it halts (pc — — 1 in kernel mode), and Q is satisfied. 

Or, in symbols, 

{P}c{Q}°± 
c = <t>(n),... 
d k a . 

Q (k', a 

0(k, <j) — Success 



n+ |c| - 1) A P(k,<7) 



A 



(f> h (k k (j, [a] n<aT D ) — (u k /x [cr'] pc) 



0(k, a) — Failure 

cf> h (k k fj, [a] n@T D ) (k «' M W] — loT D ) 

To compose self-contained code with escaping code, we prove 
two composition laws for these triples, one for pre-composing with 
specified self-contained code and another for post-composing with 
arbitrary (unreachable) code: 

{Pi} Cl {P 2 } {P 2 }c 2 {P 3 }° {P}d{Q}% 



{Pi} C1++C2 {P 3 }° 



{P} C14+C2 {Q}° 
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We use these new triples to specify the Ret and Jump instruc- 
tions, which could not be given useful specifications using the self- 
contained-code triples, e.g. 

P(k, ct) := 3 a' . Q(n, u') A a = (pc, u); a' 
0(k,o") := Success 

{P} [Ret]{Q}& 
Everything comes together in verifying the fault handler. We 
use contained-code triples to specify everything except for [Ret], 
[Jump], and the final gen If, and then use the escaping-code triple 
composition laws to connect the non-returning part of the fault 
handler to the final gen If. 

8. Refinement 

We have two remaining verification goals. First, we want to show 
that the concrete machine of §5 (running the fault handler of §6 
compiled from 1Z abs ) enjoys TINI. Proving this directly for the con- 
crete machine would be dauntingly complex, so instead we show 
that the concrete machine is an implementation of the abstract ma- 
chine, for which noninterference will be much easier to prove (§10). 
Second, since a trivial always-diverging machine also has TINI, we 
want to show that the concrete machine is a faithful implementation 
of the abstract machine that emulates all its behaviors. 

We phrase these two results using the notion of machine refine- 
ment, which we develop in this section, and which we prove in 
§10 to be TINI preserving. In §9, we prove a two-way refinement 
(one direction for each goal), between the abstract and concrete 
machines, via the symbolic rule machine in both directions. 

From here on we sometimes mention different machines (ab- 
stract, symbolic rule, or concrete) in the same statement (e.g., when 
discussing refinement), and sometimes talk about machines genet- 
ically (e.g., when defining TINI for all our machines); for these 
purposes, it is useful to define a generic notion of machine. 

Definition 8.1. A generic machine (or just machine) is a 5-tuple 
M — (S,E,I,- — > -,Init), where S is a set of states (ranged 
over by s), E is a set of events (ranged over by e), • — > ■ C 
S x (E + {t}) x S is a step relation, and 7 is a set of input data 
(ranged over by i) that can be used to build initial states of the 
machine with the function Init G I — >• S. We call E + {r} the set 
of actions of M (ranged over by a). 

Conceptually, a machine's program is included in its input data 
and gets "loaded" by the function Init, which also initializes the 
machine memory, stack, and PC. The notion of generic machine 
abstracts all these details, allowing uniform definitions of refine- 
ment and TINI that apply to all three of our IFC machines. To avoid 
stating it several times below, we stipulate that when we instanti- 
ate Definition 8.1 to any of our IFC machines, Init must produce 
an initial stack with no return frames. 

A generic step s\ A S2 or s\ —¥ S2 produces event e or is 
silent. The reflexive-transitive closure of such steps, omitting silent 
steps (written S\ A* s%) produces traces — i.e., lists, t, of events. It 
is defined inductively by 



81 



S2 S 2 -¥ S 3 



Si 



S2 S 2 



(1) 

S S si S 3 Si A* S 3 

where we write e for the empty trace and e.t for consing e to t. 
When the end state of a step starting in state s is not relevant we 
write s A, and similarly s — for traces. 

When relating executions of two different machines through a 
refinement, we establish a correspondence between their traces. 
This relation is usually derived from an elementary relation on 
events, > C E\ x E2, which is lifted to actions and traces: 



Definition 8.2 (Matching). Given a relation D> C Ei x E2 between 
two sets of events, its lifts to actions and traces are defined: 



ati [>] ct2 

x [>] y 

We are now ready to define refinement 



[a.i = t = 0.2 V Qfi = e 1 > e2 = ct2) 
length(:r) = length(y) A Vi. x» > yi. 



Definition 8.3 (Refinement). Let Mi = (Si, Ei, h, ■ — H ■, Initi) 
and A'I 2 — (£2, E2,l2,- — >2 ■, Init2) be two machines. A refine- 
ment of Mi into M2 is a pair of relations (>j,> e ), where >j C 
Ii x I2 and > e C E 1 x E2, such that whenever i 1 [>; 12 and 

Init2(i2) 



2 '*, there exists a trace t\ such that Init\(ii) U 



and t\ [l> e ] t2 - We also say that M2 refines M\. Graphically: 

t , 



12 



Initi(ii) 



Initial) 



*2 



(Plain lines denote premises, dashed ones conclusions.) 

In order to prove refinement, we need a variant that considers 
executions starting at arbitrary related states. 

Definition 8.4 (Refinement via states). Let Mi, M2 be as above. A 
state refinement of Mi into M2 is a pair of relations (> s , t> e ), where 
t> s C Si x S2 and > e C E\ x E2, such that, whenever si t> 3 S2 



and S2 



'2, 



, there exists t± such that si 



Si - - •»* 



and ti 



ti. 



si 



t 2 



If the relation on inputs is compatible with the one on states, we 
can use state refinement to prove refinement. 

Lemma 8.5. Suppose ii >j 22 => Initi(ii) t> s Initifa), for all ii 
and %2 - If (> s , > e ) is a state refinement then (>j, > e ) is a refinement. 

Our plan to derive a refinement between the abstract and con- 
crete machines via the symbolic rule machine requires composition 
of refinements. 

Lemma 8.6 (Refinement Composition). Let (>P, t>l 2 ) be a refine- 
ment between Mi and M2, and (>P,t>e 3 ) a refinement between 
M 2 and M 3 . The pair (of 3 o >} 2 ,i>l 3 o >* 2 ) that composes the 
matching relations for initial data and events on each layer is a 
refinement between Mi and M3. This can be summarized in the 
following diagram: 



. ti - - - 

»!--■> Si 



12 - " -** S 2 



«3 



*3 



' S3 



9. Refinements Between Concrete and Abstract 

In this section, we show that (1) the concrete machine refines the 
symbolic rule machine, and (2) vice versa. Using (1) we will be 
able to show in §10 that the concrete machine is noninterfering. 
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From (2) we know that the concrete machine faithfully implements 
the abstract one, exactly reflecting its execution traces. 

Abstract and symbolic rule machines The symbolic rule machine 
(with the rule table 7Z abs ) is a simple reformulation of the abstract 
machine. Their step relations are (extensionally) equal, and started 
from the same input data they emit the same traces. 

Definition 9.1 (Abstract and symbolic rule machines as generic 
machines). For both abstract and symbolic rule machines, input 
data is a 4-tuple (p, args, n, L) where p is a program, args is a 
list of atoms (the initial stack), and n is the size of the memory, 
initialized with n copies of Ooi. The initial PC is 0@L. 

Lemma 9.2. The symbolic rule machine instantiated with the rule 
table 7Z 3bB refines the abstract machine through (=,=). 

Concrete machine refines symbolic rule machine We prove this 
refinement using a fixed but arbitrary rule table, TZ, an abstract 
lattice of labels, and a concrete lattice of tags. The proof uses the 
correctness of the fault handler (§7), so we assume that the fault 
handler of the concrete machine corresponds to the rule table of the 
symbolic rule machine {(j> — <f>n) and that the encoding of abstract 
labels as integer tags is correct. 

Definition 9.3 (Concrete machine as generic machine). The input 
data of the concrete machine is a 4-tuple (p, args, n, T) where p is 
a program, args is a list of concrete atoms (the initial stack), and 
the initial memory is n copies of 0@T. The initial PC is 0<aT. The 
machine starts in user mode, the cache is initialized with an illegal 
opcode so that the first instruction always faults (giving the fault 
handler a chance to run and install a correct rule without requiring 
the initialization process to invent one), and the fault handler code 
parameterizing the machine is installed in the initial privileged 
instruction memory <j>- 

The input data and events of the symbolic rule and concrete 
machines are of different kinds; they are matched using relations 
(>f and >e respectively) stipulating that payload values should be 
equal and that labels should correspond to tags modulo the function 
Tag of the concrete lattice. 

args' = map (X(n@L). naTag(L)) args 
(p, args, n, L) >? (p, args', n, Tag(i)) n&L >g n@Tag(i) 

Theorem 9.4. The concrete IFC machine refines the symbolic rule 
machine, through (>?, >g). 

We prove this theorem by a refinement via states (Lemma 9.7); 
this, in turn, relies on two technical lemmas (9.5 and 9.6). 

The matching relation \> c s between the states of the concrete and 
symbolic rule machines is defined as such that 

1. i q >l i c => Init q {i q ) \> c a Init c (i c ), 

2- (>s, >V) is a state refinement of the symbolic rule machine into 
the concrete machine. 



Define \> c s C S q x S c by 



(2) 



fj, q , [<jq], n@L u,K,fi c ,[a c ],n@Tag(L) 

where the new notations are defined as follows. The relation t> m de- 
mands that the memories be equal up to the conversion of labels to 
concrete tags. The relation > CT on stacks is similar, but additionally 
requires that return frames in the concrete stack have their privi- 
lege bit set to u. The basic idea is to match, in >°, only concrete 
states that are in user mode. We also need to track an extra invari- 
ant, TZ h ft, which means that the cache k is consistent with the 
table 1Z — i.e., k never lies. More precisely, the output part of n 
represents the result of applying the symbolic rule judgment of 1Z 



to the opcode and labels represented in the input part of n. 

1Z h [m, K 0 ] — V opcode L\ L2 L3 L pc , 

Tag(£i) I Tag(L 2 ) | Tag(£^ 



Ki = opcode I Tag(Lp e ) 
3L r 



-TpC 



Hrc (L pc , L\, L2, £3) 
A k. 0 = (Tag (L rpc ), Tag (L r )) 

To prove refinement via states, we must account for two situa- 
tions. First, suppose the concrete machine can take a user step. In 
this case, we match that step with a single symbolic rule machine 
step. We write cs 7 ' to denote a concrete state cs whose privilege bit 

is 7T. 

Lemma 9.5 (Refinement, non-faulting concrete step). Let cs" be a 
concrete state and suppose that cs" -2% CS2 • Let qsi be a symbolic 
rule machine state with qsi t>s cs" . Then there exist qs2 and ct a such 
that qsi — ^> qs2, with qs2 >s cs" 2 , and a a [>e] a c . Graphically: 



qsi 



-> qs2 

-*CS U 2 



Since the concrete machine is able to make a user step, the input 
part of the cache must match the opcode and data of the current 
state. But the invariant 1Z h n says that the corresponding symbolic 
rule judgment holds. Hence the symbolic rule machine can also 
make a step from qs2, as required. 

Proof. We know that qs\ >g cs" . By inverting (2), qsi and cs^ are at 
the same opcode with the same stack and memory (up to translation 
between labels and tags), and 1Z h k{cs\). Thus k(csi) matches a 
line of the symbolic IFC rule table, and since the concrete machine 
performs a user step from cs" to CS2, it is a line that allows a step 
to be taken. We conclude that the symbolic rule machine is able to 
perform the step to qs2 as required. □ 

The second case is when the concrete machine faults into kernel 
mode and returns to user mode after some number of steps. 

Lemma 9.6 (Refinement, faulting concrete step). Let csJJ be a con- 
crete state, and suppose that the concrete machine does a faulting 
step to cs\, stays in kernel mode until cs^, and then exits kernel 
mode by stepping to cs" n+1 . Let qso be a state of the symbolic rule 
machine that matches cs[j. Then qso t>g cs^ +1 . 

To prove this lemma, we must consider two cases. If the corre- 
sponding symbolic rule judgment holds, then we apply Lemma 7.1 
to conclude directly — i.e., the machine exits kernel code into user 
mode. Otherwise, we apply Lemma 7.2 and derive a contradiction 
that the fault handler ends in a failing state in kernel mode. 

Lemmas 9.5 and 9.6 can be summarized graphically by: 



qsi 



-> qs2 

-» cs" 9 



qso 



s II + 1 



Proof. Since the concrete machine performs a faulting step from 
csq to csi, we know that the current cache input, /tj(csi), corre- 
sponds to the current instruction and the tags it manipulates (they 
have been put there when entering kernel mode). Now, there are 
two cases. If evaluating the corresponding IFC rule at the symbolic 
rule level succeeds, then we apply Lemma 7. 1 to conclude directly. 
Otherwise, we apply Lemma 7.2 and derive that the fault handler 
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ends up in a failing state in kernel mode. This contradicts our initial 
hypothesis saying that the concrete machine performed a sequence 
of steps returning to user-mode. □ 

Given two matching states of the concrete and symbolic rule ma- 
chines, and a concrete execution starting at that concrete state, these 
two lemmas can be applied repeatedly to build a matching execu- 
tion of the symbolic rule machine. There is just one last case to 
consider, namely when the execution ends with a fault into ker- 
nel mode and never returns to user mode. However, no output is 
produced in this case, guaranteeing that the full trace is matched. 
We thus derive the following refinement via states, of which Theo- 
rem 9.4 is a corollary. 

Lemma 9.7. The pair (t>s,t>e) defines a refinement via states 
between the symbolic rule machine and the concrete machine. 

Concrete machine refines abstract machine By composing the 
refinement of Lemma 9.2 and the refinement of Theorem 9.4 in- 
stantiated to the concrete machine running 07jabs, we can conclude 
that the concrete machine refines the abstract one. 

Abstract machine refines concrete machine The previous refine- 
ment, (>s, >e)> would also hold if the fault handler never returned 
when called. So, to ensure the concrete machine reflects the behav- 
iors of the abstract machine, we next prove an inverse refinement: 

Theorem 9.8. The abstract IFC machine refines the concrete IFC 
machine via (t>~ c , >,r c ), where >r c and >~ c are the relational 
inverses of >£ and >%, 

This guarantees that traces of the abstract machine are also 
emitted by the concrete machine. As above we use the symbolic 
rule machine as an intermediate step and show a state refinement 
of the concrete into the symbolic rule machine. We rely on the 
following lemma, where l>~ c is the inverse of >°. 

Lemma 9.9 (Forward refinement). Let qso and cso be two states 
with cso >7 C Q S Q ■ Suppose that the symbolic rule machine takes a 
step qso qsi. Then there exist concrete state csi and action 
a c such that csq csi, with csi >J C qsi and ct c [>,T C ] a a . 



CSq 



qso 



-> CSl 



qsi 



where t> s c and > e c denote the inverses of >^ and >%, respectively. 

To prove this lemma, we consider two cases. If the cache input 
of csq matches the opcode and data of csq, then the concrete 
machine can take a step csq cs\. Moreover, 1Z h k in cso 
says the cache output is consistent with the symbolic rule judgment, 
so the tags in a c and csi are properly related to the labels in 
a a and qsi. Otherwise, a cache fault occurs, loading the cache 
input and calling the fault handler. By Lemma 7.1 and the fact that 
qso — ^ qsi, the cache output is computed to be consistent with 
1Z, and this allows the concrete step as claimed. 

Proof. Because cso t>J c qso, the cache is consistent with the sym- 
bolic rule table 1Z. If the cache input matches the opcode and data 
of cso, then (because qs a qsi) the cache output must allow a 
step csq csi as required. On the other hand, if the cache in- 
put does not match the opcode and data of cso, then a cache fault 
occurs, loading the cache input and calling the fault handler. By 
Lemma 7.1 and the fact that qso — ^> qs\, the cache output is com- 
puted to be consistent with 1Z, and this allows the concrete step as 
claimed. □ 



Discussion The two top-level refinement properties (9.4 and 9.8) 
share the same notion of matching relations but they have been 
proved independently in our Coq development. In the context of 
compiler verification [48, 73], another proof methodology has been 
favored: a backward simulation proof can be obtained from a proof 
of forward simulation under the assumption that the lower-level 
machine is deterministic. (CompCerfTSO [73] also requires a re- 
ceptiveness hypothesis that trivially holds in our context.) Since our 
concrete machine is deterministic, we could apply a similar tech- 
nique. However, unlike in compiler verification where it is common 
to assume that the source program has a well-defined semantics (i.e. 
it does not get stuck), we would have to consider the possibility that 
the high-level semantics (the symbolic rule machine) might block 
and prove that in this case either the IFC enforcement judgment is 
stuck (and Lemma 9.6 applies) or the current symbolic rule ma- 
chine state and matching concrete state are both ill-formed. 

10. Noninterference 

In this section we define TINI [1, 35] for generic machines (recall 
Definition 8.1), and present a set of unwinding conditions [30] suf- 
ficient to guarantee TINI for a generic machine (Theorem 10.3); 
we show that the abstract machine of §3 satisfies these unwinding 
conditions and thus satisfies TINI (Theorem 10.5), that TINI is pre- 
served by refinement (Theorem 10.6), and finally, using the fact that 
the concrete IFC machine refines the abstract one (Theorem 9.4), 
that the concrete machine satisfies TINI (Theorem 10.8). 

Termination-insensitive noninterference (TINI) To define non- 
interference, we need to talk about what can be observed about the 
output trace produced by a run of a machine. 

Definition 10.1 (Observation). A notion of observation for a generic 
machine is a 3-tuple (fi, |_-J-, -~- ■)■ ^ is a set of observers (i.e., 
different degrees of power to observe), ranged over by o. For each 
o G fl, [-Jo C E is a predicate of observability of events for ob- 
server o, and • ~ 0 • C / x / is a relation of indistinguishability of 
input data for observer o. 

The predicate [ej 0 is used to filter unobservable events from 
traces (written [t\ 0 ): 



[e.t\ 0 = 



e.\t\ 0 if[ej 0 

Wo 



otherwise 



Also a notion of indistinguishability of traces (written t\ w £2) is 
defined inductively: 



t 



t : 



tl ~ t 2 

e.ti m e.t2 



(3) 



This definition truncates the longer trace to the same length as the 
shorter and then demands that the remaining elements be pairwise 
identical. 

Definition 10.2 (TINI). A machine (S, -,Init) with a 

notion of observation (Q, ~. •) satisfies TINI if, for any 

observer o G fi, pair of indistinguishable initial data n ~ 0 12, 

and pair of executions Init(ii) ^* and Init(i 2 ) we have 

|*lJo « WJo. 

Since a machine's program is part of its input data, this defini- 
tion of TINI, quantified over all observers and input data, is concep- 
tually quantified over all programs too. Because of the truncation 
of traces in (3), the observer cannot detect the absence of output, 
i.e., it cannot distinguish between successful termination, failure 
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with an error, or entering an infinite loop with no observable output. 
This TINI property is standard for a machine with output [1, 35]. 4 

Unwinding conditions Having defined TINI for generic notions 
of machine and observation, we now explain a sufficient set of 
conditions for such a machine to have the TINI property and sketch 
a proof of TINI from these conditions. The proof technique is 
standard [30]. 

A silent action cannot be observed, so we extend the given 
predicate [e\ 0 to actions by stating that [_t\ 0 never holds. From 
this we inductively define a notion of indistinguishability of actions 
to observer o (written ct\ Ri 0 a 2 ): 

a Ri 0 a a\ Ri D Q.2 

Two actions are indistinguishable to o if either they are equal, or if 
neither can be observed by o. 

Theorem 10.3. A machine (S,E,I,- — > -,Init) with notion of 
observation (fi, |_'J-i' ~- 0 satisfies TINI if, for each o 6 Si, 
there exist two relations, indistinguishability of states to observer o 
(written si Ri 0 S2) and observability of states to observer o (written 



[sj 0 ), satisfying four sanity conditions 

ii ~o i-2 => Init{i\) ~ 0 Init(i 2 ) (5) 

Sl Ri D S2 => S 2 ~o Si (6) 

si Rio S2 =>■ (L s iJo L s aJo) (7) 

(LaJoAsA) =► [ S \ 0 (8) 

and three unwinding conditions, assuming si ~ 0 S2andsi — L -¥ s[: 

([si\ 0 A s 2 s 2 ) => («i R!„ a 2 A «1 Ri 0 S2) (9) 

(-.|siJ° A-.[/iJ°) => si ~o s 2 (10) 

(^[sijo A [s'iJo A [s 2 Jo A s 2 s 2 ) si w„ s' 2 (11) 



We outline the proof, which motivates each of the sanity and 
unwinding conditions. To prove TINI we must consider pairs of 
traces of machine evaluations starting from initial states Init(ii) 
and Init(i 2 ) and show that, after filtering for observability, these 
pairs of traces are indistinguishable. For the proof, we also maintain 
the invariant that the pairs of states reached by the two evaluations 
are indistinguishable. We are given that i\ ~ 0 i 2 , so by (5) the 
initial states are indistinguishable, as are the traces emitted so far 
(namely e). 

Now suppose the two evaluations have arrived at two indistin- 
guishable states, si ~ 0 s 2 , and that the filtered traces emitted so 
far are indistinguishable. If si can take a step, Si — h s[, what is 
possible for steps from s 2 ? (We may assume that s 2 s 2 : if no 
step is possible from s 2 then we are already done because (3), used 
in the definition of TINI, truncates the trace from si at this point.) 
Proceed by cases on observability of si. 

(9) says that, if [sijo, then the new states, s\ and s 2 , and the 
emitted traces remain indistinguishable. 

On the other hand, suppose ^L s iJ°' proceed by cases on ob- 
servability of s'i. (10) says that, if ^[ s 'iJo> then s 'i ~o s 2 ; and by 
(8), since si is unobservable, ai must be unobservable, so the fil- 
tered emitted traces remain indistinguishable. 



It is called "progress-insensitive noninterference" in a recent survey [35]. 
We have stated it for inductively defined executions and traces (1), which 
is all we need in this paper, but it can easily be lifted to coinductive 
executions and traces: not only successfully terminating and finitely failing 
executions, but also infinite executions. This holds because TINI is a 2- 
safety hyperproperty [18]; a formal proof of this can be found in our Coq 
development. 



Finally, the case where ^L s iJ° and L s 'iJ°- Then -^[_s 2 \ 0 (by 
(7)), and ai and a 2 are both unobservable by (8). Consider cases 
on observability of s' 2 . The filtered traces emitted up to s[ and s 2 
are indistinguishable, and if [s' 2 \ 0 we are done by (1 1). If -1 [s' 2 \ 0 , 
we are in a case symmetric to the paragraph above; by (6) and (10) 
we have si ~ 0 s 2 , and again the filtered traces emitted up to these 
points are indistinguishable. □ 

TINI for abstract IFC machine We now instantiate Theorem 10.3 
with the abstract machine defined in §3, showing it satisfies TINI 
for the following notion of observation: 

Definition 10.4 (Observation for abstract machine). Let £ be a 
lattice, with partial order <. For the abstract machine, events n@L 
are atoms; we define indistinguishability of atoms, a\ Rig a 2 , as 
in (4) above. The notion of observation for the abstract machine is 
(£, L-JV ~ a ■), where 

[ne>£j g = L < o 

(p,argsi,n,L) «" (p,args s ,n,L) = argsj [«"] args s . 

(On the right-hand side of the second equation, is indistin- 
guishability of atoms, lifted to lists as in Definition 8.2.) 

To instantiate Theorem 10.3 we must exhibit relations of ob- 
servability and indistinguishability on states. We outline them here. 
The actual proof of the sanity and unwinding conditions can be 
found in the formal development. 

A state s = (fj. [o] pc) of the abstract machine is observable 
by observer o G £, written [s\ 0 , whenever pc = n@L pc is itself 
observable, i.e., L pc < o. 

Indistinguishability of states is defined inductively: 

[PC\ 0 ^[pCl\o -^[PC 2 \o 

a\ o 2 Mi [~°] m 0-1 ~g o 2 jx\ [sag] /x 2 

jltl [<7i] pc Rig )1 2 [a 2 ] pc m [01] pc\ Rig fl 2 [o- 2 ] PC 2 

Here we abuse the notation of lifting, using it for memories 
and stacks (two stack elements are indistinguishable if they are in- 
distinguishable atoms, or are both return stack frames, with indis- 
tinguishable return addresses). 

For non-observable states (right), the relation is more permis- 
sive. Indeed, the abstract IFC machine steps from an observable 
state to a non-observable state when, e.g., branching on the value 
of a secret. When that happens, the tight correspondence on states 
no longer holds. Depending on the value of a secret, the machine 
could, e.g., jump to different instruction addresses or update the 
memory in different ways. The interesting point to note is the re- 
lation ■ ~ a ■ which we use to relate stacks. It says that only the 
parts of the stacks that are below the most recent call to a function 
from an observable state (including the return stack frame element) 
are related through [Rig]. In this way we relax the correspondence 
between call stacks of two machines (allowing them to e.g., put dif- 
ferent numbers of values on their operand stacks, perform more or 
fewer function calls . . . ), while at the same time, keeping the in- 
variant that holds on the "observable" part of the stacks, which we 
will need when proving Equation 1 1 for the abstract machine. 

Theorem 10.5. The relations [-J a and • Ri a ■ satisfy the sanity 
and unwinding conditions of Theorem 10.3; thus, the abstract IFC 
machine has TINI. 

TINI preserved by refinement 

Theorem 10.6 (TINI preservation). Suppose that generic machine 
M 2 refines Mi by refinement (>i,O e ) and that each machine is 
equipped with a notion of observation. Suppose that, for all ob- 
servers o 2 of M 2 , there exists an observer o\ of M\ such that the 
following compatibility conditions hold for all ei,e'\ G E\, all 
e 2 , e' 2 G E 2 , and all i 2 ,i' 2 G I 2 . 
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instr ::= 

I ■■■ 
Alloc 
SizeOf 
Eq 

SysCalUd 

GetOff 

Pack 

Unpack 

PushCachePtr 

Dup n 

Swa p n 



extensions to instruction set 

allocate a new frame 
fetch frame size 
value equality 
system call 
extract pointer offset 
atom from payload and tag 
atom into payload and tag 
push cache address on stack 
duplicate atom on stack 
swap two data atoms on stack 



Figure 10. Additional instructions for extensions 

1. ei t> e e 2 =>■ (L e lJoi <^ L e 2jo 2 ) 

2. i 2 ~ 02 «2 3 i 1 ^ 01 i' 1 . (ii > t i 2 A i\ > t i' 2 ) 

3. (ei w D1 e[ A ei l> e e 2 A e[ > e e 2 ) e 2 ~ D2 e 2 

Then, if Mi has UNI, M 2 also has TINI. 

Some formulations of noninterference are subject to the refine- 
ment paradox [39], in which refinements of a noninterferent system 
may violate noninterference. We avoid this issue by employing a 
strong notion of noninterference that restricts the amount of non- 
determinism in the system and is thus preserved by any refinement 
(Theorem 10. 6). 5 Since our abstract machine is deterministic, it is 
easy to show this strong notion of noninterference for it. In §13 
we discuss a possible technique for generalizing to the concurrent 
setting while preserving a high degree of determinism. 

TINI for concrete machine with IFC fault handler It remains 
to define a notion of observation on the concrete machine, instanti- 
ating the definition of TINI for this machine. This definition refers 
to a concrete lattice CL, which must be a correct encoding of an ab- 
stract lattice £: the lattice operators genBot, gen Join, and genFlows 
must satisfy the specifications in §7. 

Definition 10.7 (Observation for the concrete machine). Let C 
be an abstract lattice, and CL be correct with respect to C. The 
observation for the concrete machine is (C, |_-J c , ■ ~ c ■), where 

[nmT] c 0 = Lab(T) < o, 

(P, argsi,n,r) (p, args' z ,n,T) = argsj args 2 , 

and args[ = map (fun noL — > noTag(i)) argsi. 

Finally, we prove that the backward refinement proved in §9 sat- 
isfies the compatibility constraints of Theorem 10.6, so we derive 
the main result: 

Theorem 10.8. The concrete IFC machine running the fault han- 
dler 0 TC abs satisfies TINI. 

11. An Extended System 

Thus far we have described our model and proof results only for a 
simple machine architecture and IFC discipline. Our Coq develop- 
ment actually works with a significantly more sophisticated model, 
extending the basic machine architecture with ^frame-based mem- 
ory model supporting dynamic allocation and a system call mecha- 
nism for adding special-purpose primitives. Building on these fea- 
tures, we define an abstract IFC machine that uses sets of principals 
as its labels and a corresponding concrete machine implementation 
where tags are pointers to dynamically allocated representations of 



5 The recent noninterference proof for the seL4 microkernel [57, 58] works 
similarly (see §12). 



t(n) = Alloc alloc k (L\ZL pc ) a p = (id, p') 

p [(Int k)mL, a, tr] n®L pc -h» 
p [(Ptr (id, 0))@L, tr] (n+l)@L pc 

i(n) — SizeOf length (p(id)) = k 

p [(Ptr (id, o))aL, tr] noL pc p [(Int k)mL, o] (n+l)@L pc 

t(n) = GetOff 

p [(Ptr (id, o))@L, a] nmL pc p [(Int o)mL, o] (n+l)@L pc 
i(n) — Eq 

p [vi@Li, v 2 @L 2 , a] n®L pc -H» 

p [(Int (vi == v 2 ))@(Li Via), <?] (n+l)@L pc 

l(u) = SysCall id T(id) = (k,f) 
/ (ffi) = v&L length (o\) = k 

p [cri ++cr 2 ] nmL pc A p [v®L,o 2 ] (n+l)@L pc 



Figure 11. Semantics of selected new abstract machine instruc- 
tions 



t(n) = Alloc alloc k u a p = (id, p) 
p(cache) — \ alloc | T pc | Ti | T D | T D || T rpe | T r 

u p [(Int &)oTi, o, tr] n@T pc ^ 

u p [(Ptr (id, 0))®T r , a] (n+l)aT rpc 

4>(n) — Alloc alloc k k a p = ( id, p') 

k p [(Int k)m_, a, o] nm_ 
k p [(Ptr(«d,0))oT D ,tr] (n+l)@T D 

4>(n) — PushCachePtr 

k p [a] n@_ A k p [(Ptr (cache, 0))oT D , a] (n+l)@T D 

4>(n) — Unpack 
k p [vi@v 2 ,a] no_ k p [v 2 ®T B , t;i@T D , a] (n+l)®T D 
4>(n) — Pack 

k p [v 2 m_,vi@_,a] n<a_^>- k p[v\&v 2 ,o\ (n+l)@T D 
v(n) = SysCall id T(id) = (k,n') length (a±) — k 
u p [ai++cr 2 ] n@T —} k p [<7i++(n+l(QT, u); a 2 ] n'mTo 



Figure 12. Semantics of selected new concrete machine instruc- 
tions 



these sets. While still much less complex than the real SAFE sys- 
tem, this extended model shows how our basic approach can be 
incrementally scaled up to more realistic designs. Verifying these 
extensions requires no major changes to the proof architecture of 
the basic system, serving as evidence of its robustness. 

Fig. 10 shows the new instructions supported by the extended 
model. Instruction PushCachePtr, Unpack, and Pack are used 
only by the concrete machine, for the compiled fault handler (hence 
they only have a kernel-mode stepping rule; they simply get stuck 
if executed outside kernel mode, or on an abstract machine). We 
also add two stack-manipulation instructions, Dup and Swap, to 
make programming the kernel routines more convenient. It remains 
true that any program for the abstract machine makes sense to run 
on the abstract rule machine and the concrete machine. For brevity, 
we detail stepping rules only for the extended abstract IFC machine 
(Fig. 11) and concrete machine (Fig. 12); corresponding extensions 



Draft 



14 



2013/11/10 



to the symbolic IFC rule machine are straightforward (we also omit 
rules for Dup and Swap). Individual rules are explained below. 

Dynamic memory allocation High-level programming languages 
usually assume a structured memory model, in which independently 
allocated frames are disjoint by construction and programs cannot 
depend on the relative placement of frames in memory. The SAFE 
hardware enforces this abstraction by attaching explicit runtime 
types to all values, distinguishing pointers from other data. Only 
data marked as pointers can be used to access memory. To obtain 
a pointer, one must either call the (privileged) memory manager to 
allocate a fresh frame or else offset an existing pointer. In partic- 
ular, it is not possible to "forge" a pointer from an integer. Each 
pointer also carries information about its base and bounds, and the 
hardware prevents it from being used to access memory outside of 
its frame. 

Frame-based memory model In our extended system, we model 
the user-level view of SAFE'S memory system by adding a frame- 
structured memory (similar to [49]), distinguished pointers (so val- 
ues, the payload field of atoms and the tag field of concrete atoms, 
can now either be an integer (Int n) or a pointer (Ptrp)), and an al- 
location instruction to our basic machines. We do this (nearly) uni- 
formly at all levels of abstraction. 6 A pointer is a pair p = (id, o) of 
a frame identifier id and an offset o into that frame. In the machine 
state, the data memory /j, is a partial function from pointers to indi- 
vidual storage cells that is undefined on out-of-frame pointers. By 
abuse of notation, fi is also a partial function from frame identifiers 
to frames, which are just lists of atoms. 

The most important new rule of the extended abstract machine 
is Alloc (Fig. 11). In this machine there is a separate memory 
region (assumed infinite) corresponding to each label. The auxiliary 
function alloc in the rule for Alloc takes a size k, the label (region) 
at which to allocate, and a default atom a; it extends fj, with a fresh 
frame of size k, initializing its contents to a. It returns the id of the 
new frame and the extended memory p! . 

IFC and memory allocation We require that the frame identifiers 
produced by allocation at one label not be affected by allocations at 
other labels; e.g., alloc might allocate sequentially in each region. 
Thus, indistinguishability of low atoms is just syntactic equality, 
preserving Definition 10.4 from the simple abstract machine, which 
is convenient for proving noninterference, as we explain below. 
We allow a program to observe frame sizes using a new SizeOf 
instruction, which requires tainting the result of Alloc with L, 
the label of the size argument. There are also new instructions 
Eq, for comparing two values (including pointers) for equality, 
and GetOff, for extracting the offset field of a pointer into an 
integer. However, frame ids are intuitively abstract: the concrete 
representation of frame ids is not accessible, and pointers cannot be 
forged or output. The extended concrete machine stepping rules for 
these new instructions are analogous to the abstract machine rules, 
with the important exception of Alloc, which is discussed below. 

A few small modifications to existing instructions in the ba- 
sic machine (Fig. 2) are needed to handle pointers properly. In 
particular: (i) Load and Store require pointer arguments and get 
stuck if the pointer's offset is out of range for its frame, (ii) Add 
takes either two integers or an integer and a pointer, where Int n + 
Int m — Int (n+m) and Ptr (id, oi) + Int 02 = Ptr (id, 01+02). 
(iii) Output works only on integers, not pointers. Analogous mod- 
ifications are needed in the concrete machine semantic rules. 

Concrete allocator The extended concrete machine's semantics 
for Alloc differ from those of the abstract machine in one key re- 



6 It would be interesting to describe an implementation of the memory 
manager in a still-lower-level concrete machine with no built-in Alloc 
instruction, but we leave this as future work. 



spect. Using one region per tag would not be a realistic strategy for 
a concrete implementation; e.g., the number of different tags might 
be extremely large. Instead, we use a single region for all user-mode 
allocations at the concrete level. We also collapse the separate user 
and kernel memories from the basic concrete machine into a single 
memory. Since we still want to be able to distinguish user and ker- 
nel frames, we mark each frame with a privilege mode (i.e., we use 
two allocation regions). Fig. 12 shows the corresponding concrete 
stepping rule for Alloc for two cases: non-faulting user mode and 
kernel mode. The concrete Load and Store rules prevent derefer- 
encing kernel pointers in user mode. The rule cache is now just 
a distinguished kernel frame cache; to access it, the fault handler 
uses the (privileged) PushCachePtr instruction. 

Proof by refinement As before, we prove noninterference for the 
concrete machine by combining a proof of noninterference of the 
abstract machine with a two-stage proof that the concrete machine 
refines the abstract machine. By using this approach we avoid some 
well-known difficulties in proving noninterference directly for the 
concrete machine. In particular, when frames allocated in low and 
high contexts share the same region, allocations in high contexts 
can cause variations in the precise pointer values returned for al- 
locations in low contexts, and these variations must be taken into 
account when defining the indistinguishability relation. For exam- 
ple, Banerjee and Naumann [8] prove noninterference by param- 
eterizing their indistinguishability relation with a partial bijection 
that keeps track of indistinguishable memory addresses. Our ap- 
proach, by contrast, defines pointer indistinguishability only at the 
abstract level, where indistinguishable low pointers are identical. 
This proof strategy still requires relating memory addresses when 
showing refinement, but this relation does not appear in the non- 
interference proof at the abstract level. The refinement proof itself 
uses a simplified form of memory injections [50]. The differences 
in the memory region structure of both machines are significant, 
but invisible to programs, since no information about frame ids is 
revealed to programs beyond what can be obtained by comparing 
pointers for equality. This restriction allows the refinement proof to 
go through straightforwardly. 

System calls To support the implementation of policy-specific 
primitives on top of the concrete machine, we provide a new system 
call instruction. The SysCall id instruction is parameterized by a 
system call identifier. The step relation of each machine is now 
parameterized by a table T that maps system call identifiers to their 
implementations . 

In the abstract and symbolic rule machines, a system call imple- 
mentation is an arbitrary Coq function that removes a list of atoms 
from the top of the stack and either puts a result on top of the stack 
or fails, halting the machine. The system call implementation is re- 
sponsible for computing the label of the result and performing any 
checks that are needed to ensure noninterference. 

In the concrete machine, system calls are implemented by ker- 
nel routines and the call table contains the entry points of these 
routines in the kernel instruction memory. Executing a system call 
involves inserting the return address on the stack (underneath the 
call arguments) and jumping to the corresponding entry point. The 
kernel code terminates either by returning a result to the user pro- 
gram or by halting the machine. 

This feature has no major impact on the proofs of noninterfer- 
ence and refinement. For noninterference, we must show that all 
the abstract system calls preserve indistinguishability of abstract 
machine states; for refinement, we show that each concrete sys- 
tem call correctly implements the abstract one using the machinery 
of §7. 

Labeling with sets of principals The full SAFE machine supports 
dynamic creation of security principals. In the extended model, 
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we make a first step toward dynamic principal creation by taking 
principals to be integers and instantiating the (parametric) lattice 
of labels with the lattice of finite sets of integers. 7 In this lattice, 
_L is 0, V is U, and < is C. We enrich our IFC model by adding a 
new classification primitive joinP that adds a principal to an atom's 
label, encoded using the system call mechanism described above. 
The operation of joinP is given by the following derived rule, which 
is an instance of the SysCall rule from Fig. 11. 

i(n) = SysCalljoinP 

fj, \v@L\, (lot m)@L2, a] nmL pc 

fj, [v@(Li\/ L2V {m}) , a] (n+l)@L pc 

At the concrete level, a tag is now a pointer to an array of prin- 
cipals (integers) stored in kernel memory. To keep the fault han- 
dler code simple, we do not maintain canonical representations of 
sets: one set may be represented by different arrays, and a given 
array may have duplicate elements. (As a consequence, the map- 
ping from abstract labels to tags is no longer a function; we return 
to this point below.) Since the fault handler generator in the ba- 
sic system is parametric in the underlying lattice, it doesn't require 
any modification. All we must do is provide concrete implementa- 
tions for the appropriate lattice operations: gen Join just allocates a 
fresh array and concatenates both argument arrays into it; genFlows 
checks for array inclusion by iterating through one array and test- 
ing whether each element appears in the other; and genBot allo- 
cates a new empty array. Finally, we provide kernel code to imple- 
ment joinP, which requires two new privileged instructions, Pack 
and Unpack (Fig. 12), to manipulate the payload and tag fields of 
atoms; otherwise, the implementation is similar to that of genjoin. 

A more realistic system would keep canonical representations 
of sets and avoid unnecessary allocation in order to improve its 
memory footprint and tag cache usage. But even with the present 
simplistic approach, both the code for the lattice operations and 
their proofs of correctness are significantly more elaborate than 
for the trivial two-point lattice. In particular, we need an additional 
code generator to build counted loops, e.g., for computing the join 
of two tags. 

gen For c = 

[Dup] ++ genlf (genLoop(c ++ [Push (-1), Add])) [] 
where gen Loop c = c ++ [Dup, Bnz (—(length c + 1))] 

Here, c is a code sequence representing the loop body, which is ex- 
pected to preserve an index value on top of the stack; the generator 
builds code to execute that body repeatedly, decrementing the index 
each time until it reaches 0. The corresponding specification is 

Pn(/t, a) := 3 Tcr'. a — noT, a' A Inv(n,<j) 
Q n (n, <y) := 3 To - '. a = n@T, a' 

AVT'. Inv(n, ((n-l)fflTV)) 
Vn.0<n^{P„}c{Q„} 

P(n,a) := 3 nla'. 0 < n A a = noT, a' A Inv(n,a) 
Q(k,(t) := 3 T a', a = 0@T, a' A Inv(n, a) 
{P} genForc{<2} 

To avoid reasoning about memory updates as far as possible, 
we code in a style where all local context is stored on the stack and 
manipulated using Dup and Swap. Although the resulting code is 
lengthy, it is relatively easy to automate the corresponding proofs. 

Stateful encoding of labels Changing the representation of tags 
from integers to pointers requires modifying one small part of the 
basic system proof. Recall that in §6 we described the encoding of 
labels into tags as a pure function Lab. To deal with the memory- 
dependent and non-canonical representation of sets described above, 



This lattice is statically known, but models dynamic creation by supporting 
unbounded labels and having no top element. 



the extended system instead uses a relation between an abstract la- 
bel, a concrete tag that encodes it, and a memory in which this tag 
should be interpreted. 

If tags are pointers to data structures, it is crucial that these 
data structures remain intact as long as the tags appear in the 
machine state. We guarantee this by maintaining the very strong 
invariant that each execution of the fault handler only allocates 
new frames, and never modifies the contents of existing ones, 
except for the cache frame (which tags never point into). A more 
realistic implementation might use mutable kernel memory for 
other purposes and garbage collect unused tags; this would require 
a more complicated memory invariant. 

The TINI formulation is similar in essence to the one in § 10, but 
some subtleties arise for concrete output events, since tags in events 
cannot be interpreted on their own anymore. We wish to (i) keep 
the semantics of the concrete machine independent of high-level 
policies such as IFC and (ii) give a statement of noninterference that 
does not refer to pointers. To achieve these seemingly contradictory 
aims, we model an event of the concrete machine as a pair of 
a concrete atom plus the whole state of the kernel memory. The 
resulting trace of concrete events is abstracted (i.e., interpreted in 
terms of abstract labels) only when stating and proving TINI. This 
is an idealization of what happens in the real SAFE machine, where 
communication of labeled data with the outside world involves 
cryptography. Modeling this is left as future work. 

12. Related Work 

The SAFE design spans a number of research areas, and a compre- 
hensive overview of related work would be huge. We focus here on 
a small set of especially relevant points of comparison. The long 
version discusses additional related work. 

Language-based IFC Static approaches to IFC have generally 
dominated language-based security research [59, 65, 69, 83]; how- 
ever, statically enforcing IFC at the lowest level of a real system 
is challenging. Soundly analyzing native binaries with reasonable 
precision is hard (static IFC for low-level code usually stops at the 
bytecode level [10, 31, 34, 51]), even more so without the com- 
piler's cooperation (e.g., for stripped or obfuscated binaries). Proof- 
carrying code [9, 10, 31] and typed assembly language [53, 84, 85] 
have been used for enforcing IFC on low-level code without low- 
level analysis or adding the compiler to the TCB. In SAFE [24, 28] 
we follow a different approach, enforcing noninterference using 
purely dynamic checks, for arbitrary binaries in a custom-designed 
instruction set. The mechanisms we use for this are similar to those 
found in recent work on purely dynamic IFC for high-level lan- 
guages [2, 4-7, 32, 33, 36, 37, 55, 64, 67, 70, 74, 79]; however, as 
far as we know, we are the first to push these ideas to the lowest 
level. 

seL4 Murray et al. [57] recently demonstrated a machine- 
checked noninterference proof for the implementation of the seL4 
microkernel. This proof is carried out by refinement and reuses the 
specification and most of the existing functional correctness proof 
of seL4 [44], Like the TINI property in this paper, the variant of in- 
transitive noninterference used by Murray et al. is preserved by re- 
finement because it implies a high degree of determinism [58]. This 
organization of their proof was responsible for a significant saving 
in effort, even when factoring in the additional work required to re- 
move all observable non-determinism from the seL4 specification. 
Beyond these similarities, SAFE and seL4 rely on completely dif- 
ferent mechanisms to achieve different notions of noninterference. 8 



Moreover, the notions of noninterference are different: seL4 admits in- 
transitive IFC policies (capturing the "where" dimension of declassifica- 
tion [71]), while we consider transitive ones. 
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Whereas, in SAFE, each word of data has an IFC label and labels 
are propagated on each instruction, the seL4 kernel maintains sep- 
aration between several large partitions (e.g., one partition can run 
an unmodified version of Linux) and ensures that information is 
conveyed between such partitions only in accordance with a fixed 
access control policy. 

PROSPER In parallel work, Dam et al. [22, 23, 43] verified in- 
formation flow security for a tiny proof-of-concept separation ker- 
nel running on ARMv7 and using a Memory Management Unit for 
physical protection of memory regions belonging to different par- 
titions. The authors argue that noninterference is not well suited 
for systems in which components are supposed to communicate 
with each other. Instead, they use the bisimulation proof method to 
show trace equivalence between the real system and an ideal top- 
level specification that is secure by construction. As in seL4 [57], 
the proof methodology precludes an abstract treatment of schedul- 
ing, but the authors contend this is to be expected when information 
flow is to be taken into account. 

TIARA and ARIES The SAFE architecture embodies a number 
of innovations from earlier paper designs. In particular, the TIARA 
design [75] first proposed the idea of a zero-kernel operating system 
and sketched a concrete architecture, while the ARIES project 
proposed using a hardware rule cache to speed up information-flow 
tracking [12]. In TIARA and ARIES, tags had a fixed set of fields 
and were of limited length, whereas, in SAFE, tags are pointers to 
arbitrary data structures, allowing them to represent complex IFC 
labels encoding sophisticated security policies [54], for instance 
decentralized ones [59, 78]. Moreover, unlike TIARA and ARIES, 
which made no formal soundness claims, SAFE proposes a set of 
IFC rules aimed at achieving noninterference; the proof we present 
in this paper, though for a simplified model, provides evidence that 
this goal is within reach. 

RIFLE and other binary-rewriting-based IFC systems RIFLE 
[81] enforces user-specified information-flow policies for x86 bi- 
naries using binary rewriting, static analysis, and augmented hard- 
ware. Binary rewriting is used to make implicit flows explicit; it 
heavily relies on static analysis for reconstructing the program's 
control-flow graph and performing reaching-definitions and alias 
analysis. The augmented hardware architecture associates labels 
with registers and memory and updates these labels on each in- 
struction to track explicit flows. Additional security registers are 
used by the binary translation mechanism to help track implicit 
flows. Beringer [11] recently proved (in Coq) that the main ideas in 
RIFLE can be used to achieve noninterference for a simple While 
language. Unlike RIFLE, SAFE achieves noninterference purely 
dynamically and does not rely on binary rewriting or heroic static 
analysis of binaries. Moreover, the SAFE hardware is generic, sim- 
ply caching instances of software-managed rules. 

While many other information flow tracking systems based 
on binary rewriting have been proposed, few are concerned with 
soundly handling implicit flows [19, 52], and even these do so 
only to the extent they can statically analyze binaries. Since, un- 
like RIFLE (and SAFE), these systems use unmodified hardware, 
the overhead for tracking implicit flows can be large. To reduce this 
overhead, recent systems track implicit flows selectively [42] or not 
at all [40, 66] — arguably a reasonable tradeoff in settings such as 
malware analysis or attack detection, where speed and precision are 
more important than soundness. 

Hardware taint tracking The last decade has seen significant 
progress in specialized hardware for accelerating taint tracking [14, 
20, 21, 25, 26, 80, 82]. Most commonly, a single tag bit is asso- 
ciated with each word to specify if it is tainted or not. Initially 
aimed at mitigating low-level memory corruption attacks by pre- 



venting the use of tainted pointers and the execution of tainted in- 
structions [14, 20, 80], hardware-based taint tracking has also been 
used to prevent high-level attacks such as SQL injection and cross- 
site scripting [21]. In contrast to SAFE, these systems prioritize 
efficiency and overall helpfulness over the soundness of the analy- 
sis, striking a heuristic balance between false positives and false 
negatives (missed attacks). As a consequence, these systems ig- 
nore implicit flows and often don't even track all explicit flows. 
While early systems supported a single hard-coded taint propa- 
gation policy, recent ones allow the policy to be defined in soft- 
ware [21, 26, 82] and support monitoring policies that go beyond 
taint tracking [15, 25, 26, 68]. Harmoni [26], for example, provides 
a pair of caches that are quite similar to the SAFE rule cache. Pos- 
sibly these could even be adapted to enforcing noninterference, in 
which case we expect the proof methodology introduced here to 
apply. 

Timing and termination Our TINI property ignores both termi- 
nation and timing: a program that diverges, fails, or takes vary- 
ing amounts of time to run based on a sensitive input is con- 
sidered secure. The full SAFE design includes a clearance-based 
access-control mechanism [79] for addressing termination and tim- 
ing covert channels (i.e., high-bandwidth channels through which 
malicious code can exfiltrate secrets it directly has access to). Ste- 
fan et al. [77] have also shown that in a concurrent setting such 
leaks can be prevented by an adapted IFC mechanism, at the risk of 
spawning very large numbers of threads. We believe that this IFC 
mechanism could also be enforced using the hardware mechanisms 
we describe here. A recently proposed technique for instruction- 
based scheduling [13, 76] is aimed at preventing leaks via the in- 
ternal timing side-channel (e.g., malicious code sharing the same 
processor inferring secrets through timing variations arising from 
cache misses); this could probably be adapted to SAFE. Finally, 
several mechanisms have been proposed for mitigating the external 
timing side-channel (i.e., leakage of secrets to an attacker making 
timing observations over the network) and thus reducing the rate at 
which bits can be leaked [3, 88]. We do not consider any of these 
attacks or mitigations in this work. 

Verification of low-level code The distinctive challenge in verify- 
ing machine code is coping with unstructured control flow. Our ap- 
proach using structured generators to build the fault handler is sim- 
ilar to the mechanisms used in Chlipala's Bedrock system [16, 17] 
and by Jensen et al. [41], but there are several points of difference. 
These systems each build macros on top of a powerful low-level 
program logic for machine code (Ni and Shao's XCAP [63], in 
the case of Bedrock), whereas we take a simpler, ad-hoc approach, 
building directly on our stack machine's relatively high-level se- 
mantics. Both these systems are based on separation logic, which 
we can do without since (at least in the present simplified model) 
we have very few memory operations to reason about. We have 
instead focused on developing a simple Hoare logic specifically 
suited to verifying structured runtime-system code; e.g., we omit 
support for arbitrary code pointers, but add support for reasoning 
about termination. We use total-correctness Hoare triples (similar 
to Myreen and Gordon [62]) and weakest preconditions to guaran- 
tee progress, not just safety, for our handler code. Finally, our level 
of automation is much more modest than Bedrock's, though still 
adequate to discharge most verification conditions on straight-line 
stack manipulation code rapidly and often automatically. 

Previous SAFE work on testing noninterference The abstract 
machine in §3 was proposed by Hrijcu et al. [38]. At the abstract- 
machine level, our development adds dynamic allocation (§11), 
which makes the noninterference proof more challenging. Our 
main concerns, though, are the concrete machine, the IFC fault 
handler, and the key properties of this combination, all of which 
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are novel. Moreover, Hrifcu et al.'s goal was to explore a method- 
ology for random testing of noninterference; here, we are interested 
in full-blown proofs. 

13. Conclusions and Future Work 

We have presented a formal model of the key IFC mechanisms of 
the SAFE system: propagating and checking tags to enforce se- 
curity, using a hardware cache for common-case efficiency and a 
software fault handler for maximum flexibility. To formalize and 
prove properties at such a low level (including features such as 
dynamic memory allocation and labels represented by pointers to 
in-memory data structures), we first construct a high-level abstract 
specification of the system, then refine it in two steps into a realistic 
concrete machine. A bidirectional refinement methodology allows 
us to prove (i) that the concrete machine, loaded with the right fault 
handler (i.e. correctly implementing the IFC enforcement of the 
abstract specification) satisfies a traditional notion of termination- 
insensitive noninterference, and (ii) that the concrete machine re- 
flects all the behaviours of the abstract specification. Our formal- 
ization reflects the programmability of the fault handling mecha- 
nism, in that the fault handler code is compiled from a rule table 
written in a small DSL. We set up a custom Hoare logic to specify 
and verify the corresponding machine code, following the structure 
of a simple compiler for this DSL. 

The development in this paper concerns three deterministic ma- 
chines and simplifies away concurrency. While the lack of concur- 
rency is a significant current limitation that we would like to re- 
move as soon as possible by moving to a multithreading single-core 
model, we still want to maintain the abstraction layers of a proof- 
by-refinement architecture. This requires some care so as not to run 
afoul of the refinement paradox [39] since some standard notions of 
noninterference (for example possibilistic noninterference) are not 
preserved by refinement in the presence of non-determinism. One 
promising path toward this objective is inspired by the recent non- 
interference proof for seL4 [57, 58]. If we manage to share a com- 
mon thread scheduler between the abstract and concrete machines, 
we could still prove a strong double refinement property (concrete 
refines abstract and vice versa) and hence preserve a strong notion 
of noninterference (such as the TINI notion from this work) or a 
possibilistic variation. 

Although this paper focuses on IFC and noninterference, the 
tagging facilities of the concrete machine are completely generic. In 
current follow-on work, we aim to show that the same hardware can 
be used to efficiently support completely different policies targeting 
memory safety and control-flow integrity. Moreover, although the 
rule cache / fault handler design arose in the context of SAFE, we 
believe that this mechanism can also be ported to more traditional 
architectures. In the future, we plan to reuse and extend the formal 
development in this paper both to a larger set of high-level proper- 
ties and to more conventional architectures. For instance, we expect 
the infrastructure for compiling DSLs to fault handler software us- 
ing verified structured code generators to extend to runtime-system 
components (e.g. garbage collectors, device drivers, etc.), beyond 
IFC and SAFE. 
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